Two-branch real-time semantic segmentation algorithm based on spatial information guidance
-
摘要:
针对实时语义分割模型大量缩减参数造成特征空间信息损失,以及特征缺少上下文信息导致分割类别预测不准确的问题,提出一种基于空间信息引导的双分支实时语义分割算法。该算法采用双分支结构分别获取特征的空间信息和语义信息,为更好地保留空间信息,设计了一种空间引导模块(SGM),同时捕获特征的局部信息和周围上下文信息,并通过通道加权给予重要信息更高的权重,有效弥补了图像高分辨率特征在降采样过程中的信息损失;为进一步强化特征的上下文信息表征能力,设计了池化特征增强模块(PFEM),采用不同尺寸的池化核捕获多尺度特征信息,并采用条状池化核对特征之间的长距离依赖关系进行建模,更好地确定分割区域的类别。在Cityscapes和CamVid数据集上对所提算法进行验证,平均交并比分别达到77.4%和74.0%,检测速度分别达到49.1帧/s和124.5帧/s,在保证实时分割的情况下有效提升了精度,获得了良好的语义分割性能。
Abstract:In view of feature spatial information loss caused by the reduction of a large number of parameters in the real-time semantic segmentation model and inaccurate segmentation category prediction caused by the lack of context information of features, a two-branch real-time semantic segmentation algorithm based on spatial information guidance was proposed. In order to better retain the spatial information, the algorithm used a two-branch structure to obtain the spatial and semantic information of features, respectively. A spatial guided module (SGM) was designed to capture the local information and the surrounding context information of the features and give higher weight to the important information through channel weighting, which effectively made up for the image information loss of high-resolution features in the process of downsampling. A pooling feature enhancement module (PFEM) was designed to further enhance the ability of context information characterization of features. Pooling cores of different sizes were used to capture multi-scale feature information, and the long-distance dependence relationship between the features was modeled by strip-shaped pooling cores.The category of the segmentation region was better determined. The proposed algorithm was verified on Cityscapes and CamVid datasets, and the mean intersection over union reached 77.4% and 74.0%, respectively. The detection speed reached 49.1 frames per second and 124.5 frames per second, respectively, which effectively improved the accuracy and achieved good semantic segmentation performance while ensuring real-time segmentation.
-
表 1 空间引导模块的消融实验结果
Table 1. Comparision ablation experiment of SGM
Baseline SGM(1) SGM(2) SGM(3) 平均交并比/% 检测速度/(帧·s−1) √ 75.8 47.3 √ √ 76.3 48.5 √ √ √ 76.6 50.6 √ √ √ √ 76.9 51.0 表 2 池化特征增强模块的消融实验结果
Table 2. Comparision ablation experiment of PFEM
Baseline PFEM 平均交并比/% 检测速度/(帧·s−1) √ 75.8 47.3 √ √ 76.8 48.2 表 3 不同池化操作的对比实验结果
Table 3. Comparision experiment of different pooling operations
Pool1 Pool2 Pool3 Pool4 平均交并比/% 检测速度/(帧·s−1) Avg Avg Avg Avg 76.77 47.3 Max Max Max Max 76.43 49.5 Avg Avg Max Max 76.54 49.1 Max Max Avg Avg 76.84 48.2 表 4 不同模块的消融实验结果
Table 4. Results of ablation experiments for different modules
Baseline SGM PFEM 平均交并比/% 检测速度/(帧·s−1) 参数量 √ 75.8 47.3 3.40×106 √ √ 76.9 51.0 3.43×106 √ √ 76.8 48.2 3.21×106 √ √ √ 77.4 49.1 3.24×106 表 5 不同算法在Cityscapes验证集中的对比
Table 5. Comparison of different algorithms on Cityscapes validation set
算法 Basenet 分辨率 平均
交并比/%检测速度/
(帧·s−1)ENet[3] 512× 1024 58.3 76.9 ESPNet[9] ESPNet 512× 1024 60.3 112.9 ERFNet[18] 512× 1024 70.0 41.7 ICNet[4] PSPNet50 1024 ×2048 69.5 30.3 BiSeNet[7] ResNet18 768× 1536 74.8 65.05 Fast-SCNN[19] 512× 1024 68.6 123.5 DABNet[20] 1024 ×2048 70.1 27.7 DFANet A′[5] XceptionA 1024 ×2048 71.3 100 BiSeNet V2[11] 512× 1024 75.8 47.3 STDC1-Seg75[24] STDC1 768× 1536 74.5 126.7 STDC2-Seg75[24] STDC2 768× 1536 77.0 97.0 FBSNet[12] 512× 1024 70.9 90 HyperSeg-M[21] EfficientNet-B1 512× 1024 76.2 36.9 RELAXNet[22] 512× 1024 74.8 64 FPANet C[23] ResNet18 512× 1024 75.9 31 本文 512× 1024 77.4 49.1 表 6 不同模型在CamVid测试集中的对比
Table 6. Comparison of different models on CamVid test set
算法 Basenet 分辨率 平均
交并比/%检测速度/
(帧·s−1)ENet[3] 960×720 51.3 61.2 ICNet[4] PSPNet50 960×720 67.1 27.8 BiSeNet[7] ResNet18 960×720 68.7 116.3 DFANet A′[5] XceptionA 960×720 64.7 120 CAS[25] 960×720 71.2 169 GAS[26] 960×720 72.8 153 LRNNet C[27] 960×720 69.2 76.5 BiSeNet V2[11] 960×720 72.4 124.5 STDC1-Seg75[24] 960×720 73.0 197.6 STDC2-Seg75[24] 960×720 73.9 152.2 RELAXNet[22] 960×720 71.2 79 FPANet B[23] 960×720 72.9 88 本文 960×720 74.0 124.5 -
[1] 宝音图, 刘伟, 李润生, 等. 遥感图像语义分割的空间增强注意力 U 型网络[J]. 北京航空航天大学学报, 2023, 49(7): 1828-1837.BAO Y T, LIU W, LI R S, et al. Spatial enhanced attention U-type network for semantic segmentation of remote sensing images[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(7): 1828-1837(in Chinese). [2] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. doi: 10.1109/TPAMI.2016.2644615 [3] PASZKE A, CHAURASIA A, KIM S, et al. ENet: A deep neural network architecture for real-time semantic segmentation[EB/OL]. (2016-06-07)[2022-12-01]. https://arxiv.org/abs/1606.02147. [4] ZHAO H S, QI X J, SHEN X Y, et al. ICNet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the 15th European Conference on Computer Vision. Berlin: Springer, 2020: 418-434. [5] LI H C, XIONG P F, FAN H Q, et al. DFANet: Deep feature aggregation for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 9514-9523. [6] WANG H C, JIANG X L, REN H B, et al. SwiftNet: Real-time video object segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 1296-1305. [7] YU C Q, WANG J B, PENG C, et al. BiSeNet: Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 334-349. [8] CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1800-1807. [9] MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 561-580. [10] WANG Y, ZHOU Q, LIU J, et al. LedNet: A lightweight encoder-decoder network for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Image Processing. Piscataway: IEEE Press, 2019: 1860-1864. [11] YU C Q, GAO C X, WANG J B, et al. BiSeNet V2: Bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11): 3051-3068. doi: 10.1007/s11263-021-01515-2 [12] GAO G W, XU G A, LI J C, et al. FBSNet: A fast bilateral symmetrical network for real-time semantic segmentation[J]. IEEE Transactions on Multimedia, 2022, 25: 3273-3283. [13] HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13708-13717. [14] WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803. [15] HOU Q B, ZHANG L, CHENG M M, et al. Strip pooling: Rethinking spatial pooling for scene parsing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4002-4011. [16] WU T Y, TANG S, ZHANG R, et al. CGNet: A light-weight context guided network for semantic segmentation[J]. IEEE Transactions on Image Processing, 2021, 30: 1169-1179. doi: 10.1109/TIP.2020.3042065 [17] HONG Y D, PAN H H, SUN W C, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes[EB/OL]. (2021-09-01)[2022-12-01]. https://arxiv.org/abs/2101.06085. [18] ROMERA E, ÁLVAREZ J M, BERGASA L M, et al. ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(1): 263-272. doi: 10.1109/TITS.2017.2750080 [19] POUDEL R P K, LIWICKI S, CIPOLLA R. Fast-SCNN: Fast semantic segmentation network[EB/OL]. (2019-02-12)[2022-12-01]. https://arxiv.org/abs/1902.04502. [20] LI G, YUN I, KIM J, et al. DABNet: Depth-wise asymmetric bottleneck for real-time semantic segmentation[EB/OL]. (2019-10-01)[2022-12-01]. https://arxiv.org/abs/1907.11357. [21] NIRKIN Y, WOLF L, HASSNER T. HyperSeg: Patch-wise hypernetwork for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 4060-4069. [22] LIU J, XU X Q, SHI Y Q, et al. RELAXNet: Residual efficient learning and attention expected fusion network for real-time semantic segmentation[J]. Neurocomputing, 2022, 474: 115-127. doi: 10.1016/j.neucom.2021.12.003 [23] WU Y, JIANG J Y, HUANG Z M, et al. FPANet: Feature pyramid aggregation network for real-time semantic segmentation[J]. Applied Intelligence, 2022, 52(3): 3319-3336. doi: 10.1007/s10489-021-02603-z [24] FAN M Y, LAI S Q, HUANG J S, et al. Rethinking BiSeNet for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 9711-9720. [25] ZHANG Y H, QIU Z F, LIU J G, et al. Customizable architecture search for semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 11633-11642. [26] LIN P W, SUN P, CHENG G L, et al. Graph-guided architecture search for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4202-4211. [27] JIANG W H, XIE Z Z, LI Y Y, et al. LRNNet: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Multimedia & Expo Workshops. Piscataway: IEEE Press, 2020: 1-6. -


下载: