基于GLCNet的轻量级语义分割算法

马素刚; 陈期梅; 侯志强; 杨小宝; 张子贤

doi:10.13700/j.bh.1001-5965.2022.0822

基于GLCNet的轻量级语义分割算法

doi: 10.13700/j.bh.1001-5965.2022.0822

马素刚^{1, 2, ,},
陈期梅¹,
侯志强^{1, 2},
杨小宝^{1, 3},
张子贤¹

1.
西安邮电大学计算机学院，西安 710121
2.
西安邮电大学陕西省网络数据分析与智能处理重点实验室，西安 710121
3.
西安邮电大学西安市大数据与智能计算重点实验室，西安 710121

基金项目: 国家自然科学基金(62072370)；西安市科技计划(22GXFW0125)

详细信息

通讯作者:
E-mail：msg@xupt.edu.cn

中图分类号: TP391.4
计量
- 文章访问数: 403
- HTML全文浏览量: 166
- PDF下载量: 42
- 被引次数: 0
出版历程
- 收稿日期: 2022-09-29
- 录用日期: 2022-11-07
- 网络出版日期: 2022-11-30
- 整期出版日期: 2024-11-30

Lightweight semantic segmentation algorithm based on GLCNet

MA Sugang^{1, 2
, ,},
CHEN Qimei¹,
HOU Zhiqiang^{1, 2},
YANG Xiaobao^{1, 3},
ZHANG Zixian¹

1.
School of Computer Science & Technology，Xi’an University of Posts & Telecommunications，Xi’an 710121，China
2.
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing，Xi’an University of Posts & Telecommunications，Xi’an 710121，China
3.
Xi’an Key Laboratory of Big Data and Intelligent Computing，Xi’an University of Posts & Telecommunications，Xi’an 710121，China

Funds: National Natural Science Foundation of China (62072370); Science and Technology Project of Xi’an City (22GXFW0125)

More Information

Corresponding author: E-mail：msg@xupt.edu.cn

摘要

摘要:
多数基于卷积神经网络的语义分割算法伴随庞大的参数量和计算复杂度，限制了其在实时处理场景中的应用。为解决该问题，提出了一种基于全局-局部上下文网络(GLCNet)的轻量级语义分割算法。该算法主要由全局-局部上下文(GLC)模块和多分辨率融合(MRF)模块构成。全局-局部上下文模块学习图像的全局信息和局部上下文信息，使用残差连接增强特征之间的依赖关系。在此基础上，提出了多分辨率融合模块聚合不同阶段的特征，对低分辨率特征进行上采样，与高分辨率特征融合增强高层特征的空间信息。在Cityscapes和Camvid数据集上进行测试，平均交并比(mIoU)分别达到69.89%和68.86%，在单块NVIDIA Titan V GPU上，速度分别达到87帧/s和122帧/s。实验结果表明：所提算法在分割精度、效率及参数量之间实现了较好的平衡，参数量仅有0.68×10⁶。
- 卷积神经网络 /
- 语义分割 /
- 上下文信息 /
- 特征融合 /
- 残差连接
Abstract:
Most semantic segmentation algorithms based on convolutional neural networks have massive parameters and high computational complexity, which limit their applications in real-time processing scenarios. Therefore, this paper proposed a lightweight semantic segmentation algorithm based on a global-local context network (GLCNet). The algorithm consisted of a global-local context (GLC) module and a multi-resolution fusion (MRF) module. The GLC module learned the global and local context information of the image, in which the dependencies between features were enhanced using residual connections. On this basis, the MRF module was proposed to aggregate features at different stages. First, upsampling was performed on low-resolution features, which were then fused with high-resolution features to enhance the spatial information of higher-level features. Tests were conducted on the Cityscapes and Camvid datasets, and the mean intersection over union (mIoU) of the algorithm achieved 69.89% and 68.86%, respectively, with speeds of 87 frame/s and 122 frame/s on a single NVIDIA Titan V GPU. The experimental results show that the algorithm achieves a good balance among segmentation accuracy, efficiency, and the number of parameters, and the number of the parameters is only 0.68×10⁶.
- convolutional neural networks /
- semantic segmentation /
- context information /
- feature fusion /
- residual connections

HTML全文

图 1 GLCNet整体框架

Figure 1. Overall framework of GLCNet

下载: 全尺寸图片幻灯片

图 2 GLC模块

Figure 2. GLC module

下载: 全尺寸图片幻灯片

图 3 Cityscapes数据集的可视化对比结果

Figure 3. Visual comparison results of Cityscapes dataset

下载: 全尺寸图片幻灯片

图 4 Camvid数据集的可视化对比结果

Figure 4. Visual comparison results of Camvid dataset

下载: 全尺寸图片幻灯片

表 1 不同算法在Cityscapes数据集上的测试结果

Table 1. Test results of different algorithms on Cityscapes dataset

算法	骨干网络	参数量	分割速度/（帧·s⁻¹）	mIoU/%
ENet^[14]	None	0.4×10⁶	76.9	58.3
SegNet^[40]	VGG16	29.5×10⁶	14.6	56.1
ICNet^[17]	PSPNet50	26.50×10⁶	30.3	69.5
BiSeNet^[20]	Xception39	5.80×10⁶	106	68.4
FSSNet^[41]	None	0.2×10⁶	51	65.6
SwiftNet^[43]	MobileNetv2	2.4×10⁶	27.7	69.7
EDANet^[31]	None	0.68×10⁶	81	67.3
DFANet^[18]	Xception	4.8×10⁶	120	67.1
ESNet^[15]	None	1.6×10⁶	41.7	69.1
Fast-SCNN^[37]	None	1.11×10⁶	123	68.0
LEDNet^[22]	None	0.91×10⁶	71	70.6
CGNet^[38]	None	0.5×10⁶		64.8
NDNet^[42]	None	0.5×10⁶	40	65.3
CFPNet^[45]	None	0.55×10⁶	30	70.1
BSDNet^[44]	Xception	1.2×10⁶	84.6	68.3
BiSeNet V2^[21]	None	3.40×10⁶	156	72.6
SGCPNet^[13]	MobileNet	0.61×10⁶	178.5	69.5
本文算法	None	0.68×10⁶	87	69.89

下载: 导出CSV

表 2 不同算法在Camvid数据集上的测试结果

Table 2. Test results of different algorithms on Camvid dataset

算法	骨干网络	参数量	mIoU/%
ENet^[14]	None	0.36×10⁶	51.3
SegNet^[40]	VGG16	29.50×10⁶	55.6
BiSeNet^[20]	Xception39		65.6
BiSeNet^[20]	ResNet18	49×10⁶	68.7
DFANet^[18]	Xception	7.80×10⁶	64.7
DABNet^[23]	None	0.76×10⁶	66.4
CGNet^[38]	None	0.5×10⁶	65.6
RGPNet^[49]	None	17.7×10⁶	66.9
FDDWNet^[47]	None	0.8×10⁶	66.9
LDPNet^[48]	None	0.8×10⁶	67.3
LRNNet^[50]	None	0.67×10⁶	67.6
HPNet^[51]	None		68.0
BCPNet^[52]	MobileNet	0.61×10⁶	67.8
BSDNet^[44]	ResNet50	22.8×10⁶	67.8
FBSNet^[46]	None	0.62×10⁶	68.9
本文算法	None	0.68×10⁶	68.86

下载: 导出CSV

表 3 消融实验结果

Table 3. Ablation experiments results

模块	融合方式			MRF	mIoU/%	参数量
模块	相加	拼接	残差连接	MRF	mIoU/%	参数量
GLC	√				66.31	0.80×10⁶
GLC		√			67.22	0.67×10⁶
GLC		√	√		67.39	0.67×10⁶
(2,2,2,4,4,8,8,16,16)		√	√		67.39	0.67×10⁶
(2,2,2,2,4,4,8,8,16)		√	√		67.25	0.67×10⁶
(2,2,2,2,4,8,8,16,16)		√	√		67.61	0.67×10⁶
(2,2,2,2,2,4,8,8,16,16)		√	√		67.49	0.69×10⁶
(1,1,1,1,4,4,8,8,12)		√	√		68.15	0.67×10⁶
GLCNet		√	√	√	68.86	0.68×10⁶

下载: 导出CSV

参考文献(52)

[1]	SIAM M, GAMAL M, ABDEL-RAZEK M, et al. A comparative study of real-time semantic segmentation for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE Press, 2018: 700-710.
[2]	郑宇祥, 郝鹏翼, 吴冬恩, 等. 结合多层特征及空间信息蒸馏的医学影像分割[J]. 北京航空航天大学学报, 2022, 48(8): 1409-1417. ZHENG Y X, HAO P Y, WU D E, et al. Medical image segmentation based on multi-layer features and spatial information distillation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(8): 1409-1417(in Chinese).
[3]	SHI W J, XU J W, ZHU D C, et al. RGB-D semantic segmentation and label-oriented voxelgrid fusion for accurate 3D semantic mapping[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(1): 183-197. doi: 10.1109/TCSVT.2021.3056726
[4]	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3431-3440.
[5]	RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2015: 234-241.
[6]	YU C Q, WANG J B, GAO C X, et al. Context prior for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 12413-12422.
[7]	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. doi: 10.1109/TPAMI.2017.2699184
[8]	ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6230-6239.
[9]	YUAN Y H, HUANG L, GUO J Y, et al. OCNet: Object context network for scene parsing[EB/OL]. (2021-03-15)[2022-09-01].
[10]	CHENG H K, CHUNG J, TAI Y W, et al. CascadePSP: Toward class-agnostic and very high-resolution segmentation via global and local refinement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 8887-8896.
[11]	HE J J, DENG Z Y, ZHOU L, et al. Adaptive pyramid context network for semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7511-7520.
[12]	NEKRASOV V, SHEN C H, REID I. Light-weight RefineNet for real-time semantic segmentation[EB/OL]. (2018-10-08)[2022-09-01].
[13]	HAO S J, ZHOU Y, GUO Y R, et al. Real-time semantic segmentation via spatial-detail guided context propagation[J/OL]. IEEE Transactions on Neural Networks and Learning Systems, 2022: 1-12[2022-09-01].
[14]	PASZKE A, CHAURASIA A, KIM S, et al. ENet: A deep neural network architecture for real-time semantic segmentation[EB/OL]. (2016-06-07)[2022-09-01].
[15]	WANG Y, ZHOU Q, XIONG J, et al. ESNet: An efficient symmetric network for real-time semantic segmentation[C]//Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision. Berlin: Springer, 2019: 41-52.
[16]	MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 561-580.
[17]	ZHAO H S, QI X J, SHEN X Y, et al. ICNet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 418-434.
[18]	LI H C, XIONG P F, FAN H Q, et al. DFANet: Deep feature aggregation for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 9514-9523.
[19]	HONG Y D, PAN H H, SUN W C, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes[EB/OL]. (2021-09-01)[2022-09-01].
[20]	YU C Q, WANG J B, PENG C, et al. BiSeNet: Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 334-349.
[21]	YU C Q, GAO C X, WANG J B, et al. BiSeNet V2: Bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11): 3051-3068. doi: 10.1007/s11263-021-01515-2
[22]	WANG Y, ZHOU Q, LIU J, et al. LEDNet: A lightweight encoder-decoder network for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Image Processing. Piscataway: IEEE Press, 2019: 1860-1864.
[23]	LI G, YUN I, KIM J, et al. DABNet: Depth-wise asymmetric bottleneck for real-time semantic segmentation[EB/OL]. (2019-10-01)[2022-09-01].
[24]	GAO R. Rethinking dilated convolution for real-time semantic segmentation[EB/OL]. (2021-11-18)[2022-09-01].
[25]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141.
[26]	WANG Q L, WU B G, ZHU P F, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 11531-11539.
[27]	HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13708-13717.
[28]	GAO Z L, XIE J T, WANG Q L, et al. Global second-order pooling convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 3019-3028.
[29]	HUANG Z L, WANG X G, HUANG L C, et al. CCNet: Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 603-612.
[30]	QIN Z Q, ZHANG P Y, WU F, et al. FcaNet: Frequency channel attention networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 763-772.
[31]	LO S Y, HANG H M, CHAN S W, et al. Efficient dense modules of asymmetric convolution for real-time semantic segmentation[C]//Proceedings of the ACM Multimedia Asia. New York: ACM, 2019: 1-6.
[32]	SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 2818-2826.
[33]	HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17)[2022-09-01].
[34]	SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 1-9.
[35]	CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 3213-3223.
[36]	BROSTOW G J, SHOTTON J, FAUQUEUR J, et al. Segmentation and recognition using structure from motion point clouds[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2008: 44-57.
[37]	POUDEL R P K, LIWICKI S, CIPOLLA R. Fast-SCNN: Fast semantic segmentation network[EB/OL]. (2019-02-12)[2022-09-01].
[38]	WU T Y, TANG S, ZHANG R, et al. CGNet: A light-weight context guided network for semantic segmentation[J]. IEEE Transactions on Image Processing, 2021, 30: 1169-1179. doi: 10.1109/TIP.2020.3042065
[39]	ROMERA E, ALVAREZ J M, BERGASA L M, et al. ERFNet: Efficient residual factorized convNet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 19(1): 263-272.
[40]	BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[41]	ZHANG X T, CHEN Z X, WU Q M J, et al. Fast semantic segmentation for scene perception[J]. IEEE Transactions on Industrial Informatics, 2019, 15(2): 1183-1192.
[42]	YANG Z G, YU H S, FU Q, et al. NDNet: Narrow while deep network for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(9): 5508-5519. doi: 10.1109/TITS.2020.2987816
[43]	ORŠIC M, KREŠO I, BEVANDIC P, et al. In defense of pre-trained ImageNet architectures for real-time semantic segmentation of road-driving images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 12599-12608.
[44]	YE L, ZENG J X, YANG Y, et al. BSDNet: Balanced sample distribution network for real-time semantic segmentation of road scenes[J]. IEEE Access, 2021, 9: 84034-84044. doi: 10.1109/ACCESS.2021.3087510
[45]	LOU A G, LOEW M. CFPNet: Channel-wise feature pyramid for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Image Processing. Piscataway: IEEE Press, 2021: 1894-1898.
[46]	GAO G W, XU G A, LI J C, et al. FBSNet: A fast bilateral symmetrical network for real-time semantic segmentation[J]. IEEE Transactions on Multimedia, 2022, 25: 3273-3283.
[47]	LIU J, ZHOU Q, QIANG Y, et al. FDDWNet: A lightweight convolutional neural network for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2020: 2373-2377.
[48]	HU X G, JING L Y. LDPNet: A lightweight densely connected pyramid network for real-time semantic segmentation[J]. IEEE Access, 2961, 8: 212647-212658.
[49]	ARANI E, MARZBAN S, PATA A, et al. RGPNet: A real-time general purpose semantic segmentation[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2021: 3008-3017.
[50]	JIANG W H, XIE Z Z, LI Y Y, et al. LRNNet: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Multimedia & Expo Workshops. Piscataway: IEEE Press, 2020: 1-6.
[51]	DONG G S, YAN Y, SHEN C H, et al. Real-time high-performance semantic image segmentation of urban street scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(6): 3258-3274. doi: 10.1109/TITS.2020.2980426
[52]	HAO S J, ZHOU Y, GUO Y R. Bi-direction context propagation network for real-time semantic segmentation[EB/OL]. (2022-03-19)[2022-09-01].