-
摘要:
多数基于卷积神经网络的语义分割算法伴随庞大的参数量和计算复杂度,限制了其在实时处理场景中的应用。为解决该问题,提出了一种基于全局-局部上下文网络(GLCNet)的轻量级语义分割算法。该算法主要由全局-局部上下文(GLC)模块和多分辨率融合(MRF)模块构成。全局-局部上下文模块学习图像的全局信息和局部上下文信息,使用残差连接增强特征之间的依赖关系。在此基础上,提出了多分辨率融合模块聚合不同阶段的特征,对低分辨率特征进行上采样,与高分辨率特征融合增强高层特征的空间信息。在Cityscapes和Camvid数据集上进行测试,平均交并比(mIoU)分别达到69.89%和68.86%,在单块NVIDIA Titan V GPU上,速度分别达到87帧/s和122帧/s。实验结果表明:所提算法在分割精度、效率及参数量之间实现了较好的平衡,参数量仅有0.68×106。
Abstract:Most semantic segmentation algorithms based on convolutional neural networks have massive parameters and high computational complexity, which limit their applications in real-time processing scenarios. Therefore, this paper proposed a lightweight semantic segmentation algorithm based on a global-local context network (GLCNet). The algorithm consisted of a global-local context (GLC) module and a multi-resolution fusion (MRF) module. The GLC module learned the global and local context information of the image, in which the dependencies between features were enhanced using residual connections. On this basis, the MRF module was proposed to aggregate features at different stages. First, upsampling was performed on low-resolution features, which were then fused with high-resolution features to enhance the spatial information of higher-level features. Tests were conducted on the Cityscapes and Camvid datasets, and the mean intersection over union (mIoU) of the algorithm achieved 69.89% and 68.86%, respectively, with speeds of 87 frame/s and 122 frame/s on a single NVIDIA Titan V GPU. The experimental results show that the algorithm achieves a good balance among segmentation accuracy, efficiency, and the number of parameters, and the number of the parameters is only 0.68×106.
-
表 1 不同算法在Cityscapes数据集上的测试结果
Table 1. Test results of different algorithms on Cityscapes dataset
算法 骨干网络 参数量 分割速度/(帧·s−1) mIoU/% ENet[14] None 0.4×106 76.9 58.3 SegNet[40] VGG16 29.5×106 14.6 56.1 ICNet[17] PSPNet50 26.50×106 30.3 69.5 BiSeNet[20] Xception39 5.80×106 106 68.4 FSSNet[41] None 0.2×106 51 65.6 SwiftNet[43] MobileNetv2 2.4×106 27.7 69.7 EDANet[31] None 0.68×106 81 67.3 DFANet[18] Xception 4.8×106 120 67.1 ESNet[15] None 1.6×106 41.7 69.1 Fast-SCNN[37] None 1.11×106 123 68.0 LEDNet[22] None 0.91×106 71 70.6 CGNet[38] None 0.5×106 64.8 NDNet[42] None 0.5×106 40 65.3 CFPNet[45] None 0.55×106 30 70.1 BSDNet[44] Xception 1.2×106 84.6 68.3 BiSeNet V2[21] None 3.40×106 156 72.6 SGCPNet[13] MobileNet 0.61×106 178.5 69.5 本文算法 None 0.68×106 87 69.89 表 2 不同算法在Camvid数据集上的测试结果
Table 2. Test results of different algorithms on Camvid dataset
算法 骨干网络 参数量 mIoU/% ENet[14] None 0.36×106 51.3 SegNet[40] VGG16 29.50×106 55.6 BiSeNet[20] Xception39 65.6 BiSeNet[20] ResNet18 49×106 68.7 DFANet[18] Xception 7.80×106 64.7 DABNet[23] None 0.76×106 66.4 CGNet[38] None 0.5×106 65.6 RGPNet[49] None 17.7×106 66.9 FDDWNet[47] None 0.8×106 66.9 LDPNet[48] None 0.8×106 67.3 LRNNet[50] None 0.67×106 67.6 HPNet[51] None 68.0 BCPNet[52] MobileNet 0.61×106 67.8 BSDNet[44] ResNet50 22.8×106 67.8 FBSNet[46] None 0.62×106 68.9 本文算法 None 0.68×106 68.86 表 3 消融实验结果
Table 3. Ablation experiments results
模块 融合方式 MRF mIoU/% 参数量 相加 拼接 残差连接 GLC √ 66.31 0.80×106 GLC √ 67.22 0.67×106 GLC √ √ 67.39 0.67×106 (2,2,2,4,4,8,8,16,16) √ √ 67.39 0.67×106 (2,2,2,2,4,4,8,8,16) √ √ 67.25 0.67×106 (2,2,2,2,4,8,8,16,16) √ √ 67.61 0.67×106 (2,2,2,2,2,4,8,8,16,16) √ √ 67.49 0.69×106 (1,1,1,1,4,4,8,8,12) √ √ 68.15 0.67×106 GLCNet √ √ √ 68.86 0.68×106 -
[1] SIAM M, GAMAL M, ABDEL-RAZEK M, et al. A comparative study of real-time semantic segmentation for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE Press, 2018: 700-710. [2] 郑宇祥, 郝鹏翼, 吴冬恩, 等. 结合多层特征及空间信息蒸馏的医学影像分割[J]. 北京航空航天大学学报, 2022, 48(8): 1409-1417.ZHENG Y X, HAO P Y, WU D E, et al. Medical image segmentation based on multi-layer features and spatial information distillation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(8): 1409-1417(in Chinese). [3] SHI W J, XU J W, ZHU D C, et al. RGB-D semantic segmentation and label-oriented voxelgrid fusion for accurate 3D semantic mapping[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(1): 183-197. doi: 10.1109/TCSVT.2021.3056726 [4] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3431-3440. [5] RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2015: 234-241. [6] YU C Q, WANG J B, GAO C X, et al. Context prior for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 12413-12422. [7] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. doi: 10.1109/TPAMI.2017.2699184 [8] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6230-6239. [9] YUAN Y H, HUANG L, GUO J Y, et al. OCNet: Object context network for scene parsing[EB/OL]. (2021-03-15)[2022-09-01]. http://arxiv.org/abs/1809.00916. [10] CHENG H K, CHUNG J, TAI Y W, et al. CascadePSP: Toward class-agnostic and very high-resolution segmentation via global and local refinement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 8887-8896. [11] HE J J, DENG Z Y, ZHOU L, et al. Adaptive pyramid context network for semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7511-7520. [12] NEKRASOV V, SHEN C H, REID I. Light-weight RefineNet for real-time semantic segmentation[EB/OL]. (2018-10-08)[2022-09-01]. http://arxiv.org/abs/1810.03272. [13] HAO S J, ZHOU Y, GUO Y R, et al. Real-time semantic segmentation via spatial-detail guided context propagation[J/OL]. IEEE Transactions on Neural Networks and Learning Systems, 2022: 1-12[2022-09-01]. https://ieeexplore.ieee.org/document/9729997. DOI: 10.1109/TNNLS.2022.3154443. [14] PASZKE A, CHAURASIA A, KIM S, et al. ENet: A deep neural network architecture for real-time semantic segmentation[EB/OL]. (2016-06-07)[2022-09-01]. http://arxiv.org/abs/1606.02147. [15] WANG Y, ZHOU Q, XIONG J, et al. ESNet: An efficient symmetric network for real-time semantic segmentation[C]//Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision. Berlin: Springer, 2019: 41-52. [16] MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 561-580. [17] ZHAO H S, QI X J, SHEN X Y, et al. ICNet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 418-434. [18] LI H C, XIONG P F, FAN H Q, et al. DFANet: Deep feature aggregation for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 9514-9523. [19] HONG Y D, PAN H H, SUN W C, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes[EB/OL]. (2021-09-01)[2022-09-01]. http://arxiv.org/abs/2101.06085. [20] YU C Q, WANG J B, PENG C, et al. BiSeNet: Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 334-349. [21] YU C Q, GAO C X, WANG J B, et al. BiSeNet V2: Bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11): 3051-3068. doi: 10.1007/s11263-021-01515-2 [22] WANG Y, ZHOU Q, LIU J, et al. LEDNet: A lightweight encoder-decoder network for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Image Processing. Piscataway: IEEE Press, 2019: 1860-1864. [23] LI G, YUN I, KIM J, et al. DABNet: Depth-wise asymmetric bottleneck for real-time semantic segmentation[EB/OL]. (2019-10-01)[2022-09-01]. http://arxiv.org/abs/1907.11357. [24] GAO R. Rethinking dilated convolution for real-time semantic segmentation[EB/OL]. (2021-11-18)[2022-09-01]. http://arxiv.org/abs/2111.09957. [25] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141. [26] WANG Q L, WU B G, ZHU P F, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 11531-11539. [27] HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13708-13717. [28] GAO Z L, XIE J T, WANG Q L, et al. Global second-order pooling convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 3019-3028. [29] HUANG Z L, WANG X G, HUANG L C, et al. CCNet: Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 603-612. [30] QIN Z Q, ZHANG P Y, WU F, et al. FcaNet: Frequency channel attention networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 763-772. [31] LO S Y, HANG H M, CHAN S W, et al. Efficient dense modules of asymmetric convolution for real-time semantic segmentation[C]//Proceedings of the ACM Multimedia Asia. New York: ACM, 2019: 1-6. [32] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 2818-2826. [33] HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17)[2022-09-01]. http://arxiv.org/abs/1704.04861. [34] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 1-9. [35] CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 3213-3223. [36] BROSTOW G J, SHOTTON J, FAUQUEUR J, et al. Segmentation and recognition using structure from motion point clouds[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2008: 44-57. [37] POUDEL R P K, LIWICKI S, CIPOLLA R. Fast-SCNN: Fast semantic segmentation network[EB/OL]. (2019-02-12)[2022-09-01]. http://arxiv.org/abs/1902.04502. [38] WU T Y, TANG S, ZHANG R, et al. CGNet: A light-weight context guided network for semantic segmentation[J]. IEEE Transactions on Image Processing, 2021, 30: 1169-1179. doi: 10.1109/TIP.2020.3042065 [39] ROMERA E, ALVAREZ J M, BERGASA L M, et al. ERFNet: Efficient residual factorized convNet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 19(1): 263-272. [40] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. [41] ZHANG X T, CHEN Z X, WU Q M J, et al. Fast semantic segmentation for scene perception[J]. IEEE Transactions on Industrial Informatics, 2019, 15(2): 1183-1192. [42] YANG Z G, YU H S, FU Q, et al. NDNet: Narrow while deep network for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(9): 5508-5519. doi: 10.1109/TITS.2020.2987816 [43] ORŠIC M, KREŠO I, BEVANDIC P, et al. In defense of pre-trained ImageNet architectures for real-time semantic segmentation of road-driving images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 12599-12608. [44] YE L, ZENG J X, YANG Y, et al. BSDNet: Balanced sample distribution network for real-time semantic segmentation of road scenes[J]. IEEE Access, 2021, 9: 84034-84044. doi: 10.1109/ACCESS.2021.3087510 [45] LOU A G, LOEW M. CFPNet: Channel-wise feature pyramid for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Image Processing. Piscataway: IEEE Press, 2021: 1894-1898. [46] GAO G W, XU G A, LI J C, et al. FBSNet: A fast bilateral symmetrical network for real-time semantic segmentation[J]. IEEE Transactions on Multimedia, 2022, 25: 3273-3283. [47] LIU J, ZHOU Q, QIANG Y, et al. FDDWNet: A lightweight convolutional neural network for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2020: 2373-2377. [48] HU X G, JING L Y. LDPNet: A lightweight densely connected pyramid network for real-time semantic segmentation[J]. IEEE Access, 2961, 8: 212647-212658. [49] ARANI E, MARZBAN S, PATA A, et al. RGPNet: A real-time general purpose semantic segmentation[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2021: 3008-3017. [50] JIANG W H, XIE Z Z, LI Y Y, et al. LRNNet: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Multimedia & Expo Workshops. Piscataway: IEEE Press, 2020: 1-6. [51] DONG G S, YAN Y, SHEN C H, et al. Real-time high-performance semantic image segmentation of urban street scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(6): 3258-3274. doi: 10.1109/TITS.2020.2980426 [52] HAO S J, ZHOU Y, GUO Y R. Bi-direction context propagation network for real-time semantic segmentation[EB/OL]. (2022-03-19)[2022-09-01]. http://arxiv.org/abs/2005.11034.