留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

空间信息引导的双分支实时语义分割算法

侯志强 戴楠 程敏捷 李富成 马素刚 范九伦

侯志强,戴楠,程敏捷,等. 空间信息引导的双分支实时语义分割算法[J]. 北京航空航天大学学报,2025,51(1):19-29 doi: 10.13700/j.bh.1001-5965.2022.0980
引用本文: 侯志强,戴楠,程敏捷,等. 空间信息引导的双分支实时语义分割算法[J]. 北京航空航天大学学报,2025,51(1):19-29 doi: 10.13700/j.bh.1001-5965.2022.0980
HOU Z Q,DAI N,CHENG M J,et al. Two-branch real-time semantic segmentation algorithm based on spatial information guidance[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(1):19-29 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0980
Citation: HOU Z Q,DAI N,CHENG M J,et al. Two-branch real-time semantic segmentation algorithm based on spatial information guidance[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(1):19-29 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0980

空间信息引导的双分支实时语义分割算法

doi: 10.13700/j.bh.1001-5965.2022.0980
基金项目: 

国家自然科学基金(62072370) 

详细信息
    通讯作者:

    E-mail:hou-zhq@sohu.com

  • 中图分类号: TP391.4

Two-branch real-time semantic segmentation algorithm based on spatial information guidance

Funds: 

National Natural Science Foundation of China (62072370) 

More Information
  • 摘要:

    针对实时语义分割模型大量缩减参数造成特征空间信息损失,以及特征缺少上下文信息导致分割类别预测不准确的问题,提出一种基于空间信息引导的双分支实时语义分割算法。该算法采用双分支结构分别获取特征的空间信息和语义信息,为更好地保留空间信息,设计了一种空间引导模块(SGM),同时捕获特征的局部信息和周围上下文信息,并通过通道加权给予重要信息更高的权重,有效弥补了图像高分辨率特征在降采样过程中的信息损失;为进一步强化特征的上下文信息表征能力,设计了池化特征增强模块(PFEM),采用不同尺寸的池化核捕获多尺度特征信息,并采用条状池化核对特征之间的长距离依赖关系进行建模,更好地确定分割区域的类别。在Cityscapes和CamVid数据集上对所提算法进行验证,平均交并比分别达到77.4%和74.0%,检测速度分别达到49.1帧/s和124.5帧/s,在保证实时分割的情况下有效提升了精度,获得了良好的语义分割性能。

     

  • 图 1  空间信息引导的双分支实时语义分割算法流程

    Figure 1.  Flow chart of two-branch real-time semantic segmentation algorithm based on spatial information guidance

    图 2  空间引导模块

    Figure 2.  Spatial guided module

    图 3  池化特征增强模块

    Figure 3.  Pooling feature enhancement module

    图 4  不同算法在Cityscapes数据集上的可视化结果对比

    Figure 4.  Comparison of visualization results of different algorithms on Cityscapes dataset

    图 5  不同算法在CamVid数据集上的可视化结果对比

    Figure 5.  Comparison of visualization results of different algorithms on CamVid dataset

    图 6  Cityscapes数据集上的准确性-速度参数比较

    Figure 6.  Comparison of accuracy-speed parameters on Cityscapes dataset

    图 7  CamVid数据集上的准确性-速度参数比较

    Figure 7.  Comparison of accuracy-speed parameters on CamVid dataset

    表  1  空间引导模块的消融实验结果

    Table  1.   Comparision ablation experiment of SGM

    Baseline SGM(1) SGM(2) SGM(3) 平均交并比/% 检测速度/(帧·s−1)
    75.8 47.3
    76.3 48.5
    76.6 50.6
    76.9 51.0
    下载: 导出CSV

    表  2  池化特征增强模块的消融实验结果

    Table  2.   Comparision ablation experiment of PFEM

    Baseline PFEM 平均交并比/% 检测速度/(帧·s−1)
    75.8 47.3
    76.8 48.2
    下载: 导出CSV

    表  3  不同池化操作的对比实验结果

    Table  3.   Comparision experiment of different pooling operations

    Pool1 Pool2 Pool3 Pool4 平均交并比/% 检测速度/(帧·s−1)
    Avg Avg Avg Avg 76.77 47.3
    Max Max Max Max 76.43 49.5
    Avg Avg Max Max 76.54 49.1
    Max Max Avg Avg 76.84 48.2
    下载: 导出CSV

    表  4  不同模块的消融实验结果

    Table  4.   Results of ablation experiments for different modules

    Baseline SGM PFEM 平均交并比/% 检测速度/(帧·s−1) 参数量
    75.8 47.3 3.40×106
    76.9 51.0 3.43×106
    76.8 48.2 3.21×106
    77.4 49.1 3.24×106
    下载: 导出CSV

    表  5  不同算法在Cityscapes验证集中的对比

    Table  5.   Comparison of different algorithms on Cityscapes validation set

    算法 Basenet 分辨率 平均
    交并比/%
    检测速度/
    (帧·s−1)
    ENet[3] 512×1024 58.3 76.9
    ESPNet[9] ESPNet 512×1024 60.3 112.9
    ERFNet[18] 512×1024 70.0 41.7
    ICNet[4] PSPNet50 1024×2048 69.5 30.3
    BiSeNet[7] ResNet18 768×1536 74.8 65.05
    Fast-SCNN[19] 512×1024 68.6 123.5
    DABNet[20] 1024×2048 70.1 27.7
    DFANet A′[5] XceptionA 1024×2048 71.3 100
    BiSeNet V2[11] 512×1024 75.8 47.3
    STDC1-Seg75[24] STDC1 768×1536 74.5 126.7
    STDC2-Seg75[24] STDC2 768×1536 77.0 97.0
    FBSNet[12] 512×1024 70.9 90
    HyperSeg-M[21] EfficientNet-B1 512×1024 76.2 36.9
    RELAXNet[22] 512×1024 74.8 64
    FPANet C[23] ResNet18 512×1024 75.9 31
    本文 512×1024 77.4 49.1
    下载: 导出CSV

    表  6  不同模型在CamVid测试集中的对比

    Table  6.   Comparison of different models on CamVid test set

    算法 Basenet 分辨率 平均
    交并比/%
    检测速度/
    (帧·s−1)
    ENet[3] 960×720 51.3 61.2
    ICNet[4] PSPNet50 960×720 67.1 27.8
    BiSeNet[7] ResNet18 960×720 68.7 116.3
    DFANet A′[5] XceptionA 960×720 64.7 120
    CAS[25] 960×720 71.2 169
    GAS[26] 960×720 72.8 153
    LRNNet C[27] 960×720 69.2 76.5
    BiSeNet V2[11] 960×720 72.4 124.5
    STDC1-Seg75[24] 960×720 73.0 197.6
    STDC2-Seg75[24] 960×720 73.9 152.2
    RELAXNet[22] 960×720 71.2 79
    FPANet B[23] 960×720 72.9 88
    本文 960×720 74.0 124.5
    下载: 导出CSV
  • [1] 宝音图, 刘伟, 李润生, 等. 遥感图像语义分割的空间增强注意力 U 型网络[J]. 北京航空航天大学学报, 2023, 49(7): 1828-1837.

    BAO Y T, LIU W, LI R S, et al. Spatial enhanced attention U-type network for semantic segmentation of remote sensing images[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(7): 1828-1837(in Chinese).
    [2] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. doi: 10.1109/TPAMI.2016.2644615
    [3] PASZKE A, CHAURASIA A, KIM S, et al. ENet: A deep neural network architecture for real-time semantic segmentation[EB/OL]. (2016-06-07)[2022-12-01]. https://arxiv.org/abs/1606.02147.
    [4] ZHAO H S, QI X J, SHEN X Y, et al. ICNet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the 15th European Conference on Computer Vision. Berlin: Springer, 2020: 418-434.
    [5] LI H C, XIONG P F, FAN H Q, et al. DFANet: Deep feature aggregation for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 9514-9523.
    [6] WANG H C, JIANG X L, REN H B, et al. SwiftNet: Real-time video object segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 1296-1305.
    [7] YU C Q, WANG J B, PENG C, et al. BiSeNet: Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 334-349.
    [8] CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1800-1807.
    [9] MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 561-580.
    [10] WANG Y, ZHOU Q, LIU J, et al. LedNet: A lightweight encoder-decoder network for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Image Processing. Piscataway: IEEE Press, 2019: 1860-1864.
    [11] YU C Q, GAO C X, WANG J B, et al. BiSeNet V2: Bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11): 3051-3068. doi: 10.1007/s11263-021-01515-2
    [12] GAO G W, XU G A, LI J C, et al. FBSNet: A fast bilateral symmetrical network for real-time semantic segmentation[J]. IEEE Transactions on Multimedia, 2022, 25: 3273-3283.
    [13] HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13708-13717.
    [14] WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803.
    [15] HOU Q B, ZHANG L, CHENG M M, et al. Strip pooling: Rethinking spatial pooling for scene parsing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4002-4011.
    [16] WU T Y, TANG S, ZHANG R, et al. CGNet: A light-weight context guided network for semantic segmentation[J]. IEEE Transactions on Image Processing, 2021, 30: 1169-1179. doi: 10.1109/TIP.2020.3042065
    [17] HONG Y D, PAN H H, SUN W C, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes[EB/OL]. (2021-09-01)[2022-12-01]. https://arxiv.org/abs/2101.06085.
    [18] ROMERA E, ÁLVAREZ J M, BERGASA L M, et al. ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(1): 263-272. doi: 10.1109/TITS.2017.2750080
    [19] POUDEL R P K, LIWICKI S, CIPOLLA R. Fast-SCNN: Fast semantic segmentation network[EB/OL]. (2019-02-12)[2022-12-01]. https://arxiv.org/abs/1902.04502.
    [20] LI G, YUN I, KIM J, et al. DABNet: Depth-wise asymmetric bottleneck for real-time semantic segmentation[EB/OL]. (2019-10-01)[2022-12-01]. https://arxiv.org/abs/1907.11357.
    [21] NIRKIN Y, WOLF L, HASSNER T. HyperSeg: Patch-wise hypernetwork for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 4060-4069.
    [22] LIU J, XU X Q, SHI Y Q, et al. RELAXNet: Residual efficient learning and attention expected fusion network for real-time semantic segmentation[J]. Neurocomputing, 2022, 474: 115-127. doi: 10.1016/j.neucom.2021.12.003
    [23] WU Y, JIANG J Y, HUANG Z M, et al. FPANet: Feature pyramid aggregation network for real-time semantic segmentation[J]. Applied Intelligence, 2022, 52(3): 3319-3336. doi: 10.1007/s10489-021-02603-z
    [24] FAN M Y, LAI S Q, HUANG J S, et al. Rethinking BiSeNet for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 9711-9720.
    [25] ZHANG Y H, QIU Z F, LIU J G, et al. Customizable architecture search for semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 11633-11642.
    [26] LIN P W, SUN P, CHENG G L, et al. Graph-guided architecture search for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4202-4211.
    [27] JIANG W H, XIE Z Z, LI Y Y, et al. LRNNet: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Multimedia & Expo Workshops. Piscataway: IEEE Press, 2020: 1-6.
  • 加载中
图(7) / 表(6)
计量
  • 文章访问数:  543
  • HTML全文浏览量:  82
  • PDF下载量:  30
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-12-10
  • 录用日期:  2023-06-02
  • 网络出版日期:  2023-07-03
  • 整期出版日期:  2025-01-31

目录

    /

    返回文章
    返回
    常见问答