遥感图像语义分割的空间增强注意力U型网络

宝音图; 刘伟; 李润生; 李钦; 胡庆

doi:10.13700/j.bh.1001-5965.2021.0544

遥感图像语义分割的空间增强注意力U型网络

doi: 10.13700/j.bh.1001-5965.2021.0544

宝音图^{1, 2},
刘伟^1, ,,
李润生¹,
李钦¹,
胡庆¹

1.
中国人民解放军战略支援部队信息工程大学数据与目标工程学院，郑州 450001
2.
中国人民解放军 31401部队，呼和浩特 010051

基金项目: 国家自然科学基金(41901378)

详细信息

通讯作者:
E-mail：greatliuliu@163.com

中图分类号: TP751.1；V19
计量
- 文章访问数: 633
- HTML全文浏览量: 231
- PDF下载量: 42
- 被引次数: 0
出版历程
- 收稿日期: 2021-09-10
- 录用日期: 2022-02-25
- 网络出版日期: 2022-03-18
- 整期出版日期: 2023-07-31

Semantic segmentation of remote sensing images based on U-shaped network combined with spatial enhance attention

BAO Yintu^{1, 2},
LIU Wei^{1
, ,},
LI Runsheng¹,
LI Qin¹,
HU Qing¹

1.
School of Data and Target Engineering，PLA Strategic Support Force Information Engineering University，Zhengzhou 450001，China
2.
Unit 31401 of PLA，Hohhot 010051，China

Funds: National Natural Science Foundation of China (41901378)

More Information

Corresponding author: E-mail：greatliuliu@163.com

摘要

摘要:
针对基于深度学习的语义分割模型在解析遥感图像时，小尺寸目标和目标边界存在分割不准确的问题，提出一种U型网络模型SGE-Unet。该模型通过优化网络结构加强模型的特征提取能力；融合空间组增强注意力，提升模型对上下文语义信息的解析能力；采用中值频率平衡交叉熵损失函数抑制类别分布不均衡的影响。在2个数据集上进行实验，SGE-Unet的整体准确率、平均交并比、 $\overline F _{1}$ 分数和Kappa系数均高于主流模型，Vaihingen数据集中小尺寸目标车的交并比和F₁分数分别为0.719和0.901，比次优模型提升了16%和11%，实验结果表明所提模型能更精准地分割小尺寸目标及目标边界。
- 遥感图像 /
- 语义分割 /
- 深度学习 /
- 注意力 /
- 损失函数
Abstract:
The performance of semantic segmentation based on deep learning still need to be improved when analyzing small-sized objects and object boundaries in remote sensing images. Aiming at this problem, we propose a U-shaped network (SGE-Unet). Firstly, the structure of the model is optimized to enhance the representation of feature. Secondly, we add the attention module of spatial group enhance to extract semantic information. Finally, the median frequency balance cross-entropy loss function is used to suppress the unbalanced distribution of classes. The experiment was conducted on two datasets and shows that the overall accuracy,mean interaction over union, $\overline F _{1}$ , and Kappa of SGE-Unet are better than mainstream models. In experiments of the Vaihingen dataset, the interaction over union and F₁ of the car reached 0.719 and 0.901, which were 16% and 11% higher than those of the model with the second-highest performance. The experimental results show that the proposed module greatly improves the segmentation of easily confused objects, small-sized objects, and object boundaries.
- remote sensing image /
- semantics segmentation /
- deep learning /
- attention /
- loss function

HTML全文

图 1 SGE-Unet结构

Figure 1. Structure of SGE-Unet

下载: 全尺寸图片幻灯片

图 2 拼接痕迹示意图

Figure 2. Seam after splicing

下载: 全尺寸图片幻灯片

图 3 拼接策略

Figure 3. Splicing method

下载: 全尺寸图片幻灯片

图 4 SGE-Unet全局分割结果

Figure 4. Global segmentation results of SGE-Unet

下载: 全尺寸图片幻灯片

图 5 Vaihingen数据集上的特征热图及局部分割结果

Figure 5. Feature heat map and local segmentation results on Vaihingen dataset

下载: 全尺寸图片幻灯片

图 6 Potsdam数据集上的特征热图及局部分割结果

Figure 6. Feature heat map and local segmentation results on Potsdam dataset

下载: 全尺寸图片幻灯片

表 1 数据集分配

Table 1. Allocation of dataset

类别	Vaihingen	Potsdam
训练集	1, 3, 11, 13, 15, 17, 21, 26, 28, 32, 34, 37	2_12, 3_10, 3_11, 3_12, 4_11, 4_12, 5_10, 5_12, 6_7, 6_8, 6_9, 6_10, 6_12, 7_7, 7_9, 7_10, 7_11, 7_12
验证集	5, 7, 23, 30	2_11, 4_10, 5_11, 7_8
测试集	2, 4, 6, 8, 10, 12, 14, 16, 20, 22, 24, 27, 29, 31, 33, 35, 38	2_10, 2_13, 2_14, 3_13, 3_14, 4_13, 4_14, 4_15, 5_13, 5_14, 5_15, 6_13, 6_14, 6_15, 7_13

下载: 导出CSV

表 2 Vaihingen数据集上的语义分割结果

Table 2. Semantic segmentation result on Vaihingen dataset

模型	IOU					F₁					OA	mIoU	$\overline F _{1}$	Kappa
模型	不透水表面	建筑	低植被	树	车	不透水表面	建筑	低植被	树	车	OA	mIoU	$\overline F _{1}$	Kappa
FCN^[5]	0.669	0.761	0.506	0.697	0.634	0.801	0.864	0.687	0.822	0.768	0.823	0.653	0.788	0.783
SegNet^[21]	0.753	0.805	0.656	0.705	0.458	0.859	0.892	0.792	0.827	0.629	0.845	0.675	0.800	0.791
DeepLabV3^[26]	0.822	0.912	0.711	0.770	0.568	0.902	0.954	0.831	0.870	0.724	0.858	0.757	0.856	0.809
SCAttNetV2^[27]	0.804	0.823	0.667	0.671	0.544	0.891	0.903	0.801	0.803	0.705	0.855	0.702	0.821	0.788
UNet++^[17]	0.783	0.878	0.614	0.716	0.619	0.952	0.952	0.817	0.931	0.809	0.854	0.722	0.892	0.808
SGE-Unet	0.803	0.890	0.706	0.830	0.719	0.935	0.966	0.934	0.941	0.901	0.866	0.790	0.935	0.824

下载: 导出CSV

表 3 Potsdam数据集上的语义分割结果

Table 3. Semantic segmentation result on Potsdam dataset

模型	IOU					F₁					OA	mIoU	$\overline F _{1}$	Kappa
模型	不透水表面	建筑	低植被	树	车	不透水表面	建筑	低植被	树	车	OA	mIoU	$\overline F _{1}$	Kappa
FCN^[5]	0.776	0.799	0.720	0.670	0.797	0.874	0.889	0.837	0.820	0.887	0.808	0.752	0.861	0.781
SegNet^[21]	0.854	0.912	0.786	0.757	0.881	0.921	0.954	0.880	0.862	0.937	0.811	0.838	0.911	0.782
DeepLabV3^[26]	0.882	0.944	0.789	0.745	0.871	0.938	0.971	0.882	0.854	0.931	0.835	0.846	0.915	0.799
SCAttNetV2^[27]	0.818	0.888	0.707	0.663	0.803	0.901	0.941	0.829	0.797	0.891	0.879	0.776	0.872	0.805
UNet++^[17]	0.864	0.923	0.781	0.767	0.865	0.941	0.974	0.895	0.883	0.933	0.872	0.840	0.925	0.828
SGE-Unet	0.879	0.940	0.815	0.795	0.891	0.955	0.982	0.905	0.957	0.969	0.883	0.864	0.954	0.843

下载: 导出CSV

表 4 Vaihingen数据集上的消融实验

Table 4. Ablation experiments result on Vaihingen dataset

模型	IOU					F₁					OA	mIoU	$\overline F _{1}$	Kappa
模型	不透水表面	建筑	低植被	树	车	不透水表面	建筑	低植被	树	车	OA	mIoU	$\overline F _{1}$	Kappa
UNet++	0.783	0.877	0.613	0.716	0.619	0.973	0.952	0.817	0.930	0.809	0.853	0.722	0.896	0.808
UNet++&EfficientNet	0.815	0.911	0.627	0.721	0.655	0.976	0.967	0.823	0.937	0.832	0.866	0.746	0.907	0.825
UNet++&EfficientNet&SGE	0.813	0.901	0.626	0.722	0.714	0.971	0.963	0.826	0.938	0.894	0.867	0.755	0.918	0.824

下载: 导出CSV

表 5 Potsdam数据集上的消融实验

Table 5. Ablation experiments result on Potsdam dataset

模型	IOU					F₁					OA	mIoU	$\overline F _{1}$	Kappa
模型	不透水表面	建筑	低植被	树	车	不透水表面	建筑	低植被	树	车	OA	mIoU	$\overline F _{1}$	Kappa
UNet++	0.811	0.905	0.695	0.666	0.804	0.965	0.974	0.895	0.883	0.931	0.872	0776	0.930	0.828
UNet++&EfficientNet	0.829	0.918	0.711	0.691	0.810	0.960	0.986	0.892	0.909	0.943	0.879	0.792	0.938	0.839
UNet++&EfficientNet&SGE	0.826	0.915	0.709	0.694	0.823	0.955	0.982	0.898	0.913	0.970	0.883	0.793	0.944	0.843

下载: 导出CSV

表 6 各模型的参数量和计算复杂度对比

Table 6. Comparison of parameters and computational complexity of different modes

模型	参数量/MB	计算复杂度 GFLOPS	OA		$\overline F _{1}$
模型	参数量/MB	计算复杂度 GFLOPS	Vaihingen	Potsdam	Vaihingen	Potsdam
FCN^[5]	21.30	19.20	0.823	0.808	0.621	0.757
SegNet^[21]	18.82	117.74	0.845	0.811	0.675	0.838
DeepLabV3^[26]	26.01	109.34	0.858	0.835	0.757	0.846
UNet++^[17]	26.79	73.77	0.854	0.872	0.722	0.840
SGE-Unet	20.01	39.13	0.866	0.883	0.790	0.864

下载: 导出CSV

参考文献(27)

[1]	YUAN X H, SHI J F, GU L C. A review of deep learning methods for semantic segmentation of remote sensing imagery[J]. Expert Systems with Applications, 2021, 169: 114417. doi: 10.1016/j.eswa.2020.114417
[2]	XING S, XIE Q, WANG M. Semantic segmentation for remote sensing images based on adaptive feature selection network[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 8006705.
[3]	蒋晨琛, 霍宏涛, 冯琦. 一种基于PCA的面向对象多尺度分割优化算法[J]. 北京航空航天大学学报, 2020, 46(6): 1192-1203. JIANG C C, HUO H T, FENG Q. An object-oriented multi-scale segmentation optimization algorithm based on PCA[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(6): 1192-1203(in Chinese).
[4]	KAMPFFMEYER M, SALBERG A B, JENSSEN R. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks[C]// IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway: IEEE Press, 2016: 680-688.
[5]	SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651. doi: 10.1109/TPAMI.2016.2572683
[6]	GUO R, LIU J B, LI N, et al. Pixel-wise classification method for high resolution remote sensing imagery using deep neural networks[J]. ISPRS International Journal of Geo-Information, 2018, 7(3): 110. doi: 10.3390/ijgi7030110
[7]	LI R, DUAN C X, ZHENG S Y, et al. MACU-Net for semantic segmentation of fine-resolution remotely sensed images[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 8007205.
[8]	ALAM M, WANG J F, CONG G P, et al. Convolutional neural network for the semantic segmentation of remote sensing images[J]. Mobile Networks and Applications, 2021, 26(1): 200-215. doi: 10.1007/s11036-020-01703-3
[9]	RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation[C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2015: 234-241.
[10]	张小娟, 汪西莉. 完全残差连接与多尺度特征融合遥感图像分割[J]. 遥感学报, 2020, 24(9): 1120-1133. ZHANG X J, WANG X L. Image segmentation models of remote sensing using full residual connection and multiscale feature fusion[J]. Journal of Remote Sensing, 2020, 24(9): 1120-1133(in Chinese).
[11]	FENG Y, DIAO W, SUN X, et al. NPALoss: Neighboring pixel affinity loss for semantic segmentation in high-resolution aerial imagery[J]. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2020, V-2-2020(I-3): 475-482. doi: 10.5194/isprs-annals-V-2-2020-475-2020
[12]	肖春姣, 李宇, 张洪群, 等. 深度融合网结合条件随机场的遥感图像语义分割[J]. 遥感学报, 2020, 24(3): 254-264. doi: 10.11834/jrs.20208298 XIAO C J, LI Y, ZHANG H Q, et al. Semantic segmentation of remote sensing image based on deep fusion networks and conditional random field[J]. Journal of Remote Sensing, 2020, 24(3): 254-264(in Chinese). doi: 10.11834/jrs.20208298
[13]	翟鹏博, 杨浩, 宋婷婷, 等. 结合注意力机制的双路径语义分割[J]. 中国图象图形学报, 2020, 25(8): 1627-1636. doi: 10.11834/jig.190533 ZHAI P B, YANG H, SONG T T, et al. Two-path semantic segmentation algorithm combining attention mechanism[J]. Journal of Image and Graphics, 2020, 25(8): 1627-1636(in Chinese). doi: 10.11834/jig.190533
[14]	杨军, 于茜子. 结合Atrous卷积的FuseNet变体网络高分遥感影响语义分割[J]. 武汉大学学报 (信息科学版), 2022, 47(7): 1071-1080. doi: 10.13203/j.whugis20200305 YANG J, YU X Z. Semantic segmentation of high-resolution remote sensing images based on improved FuseNet combined with the Atrous convolution[J]. Geomatics and Information Science of Wuhan University, 2022, 47(7): 1071-1080(in Chinese). doi: 10.13203/j.whugis20200305
[15]	WANG X Y, CUI Z Y, CAO Z J, et al. Dense docked ship detection via spatial group-wise enhance attention in SAR images[C]// IEEE International Geoscience and Remote Sensing Symposium. Piscataway: IEEE Press, 2021: 1244-1247.
[16]	TAN M X, LE Q V. EfficientNet: Rethinking model scaling for convolutional neural networks[C]// IEEE International Conference on Machine Learning (ICML).Piscataway: IEEE press, 2019: 6105-6114.
[17]	ZHOU Z W, SIDDIQUEE M M R, TAJBAKHSH N, et al. UNet++: A nested U-net architecture for medical image segmentation[C]// International Workshop on Deep Learning in Medical Image Analysis, International Workshop on Multimodal Learning for Clinical Decision Support. Berlin: Springer, 2018: 3-11.
[18]	李道纪, 郭海涛, 卢俊, 等. 遥感影像地物分类多注意力融和U型网络法[J]. 测绘学报, 2020, 49(8): 1051-1064. doi: 10.11947/j.AGCS.2020.20190407 LI D J, GUA H T, LU J, et al. A remote sensing image classification procedure based on multilevel attention fusion U-Net[J]. Acta Geodaetica et Cartographica Sinica, 2020, 49(8): 1051-1064(in Chinese). doi: 10.11947/j.AGCS.2020.20190407
[19]	言有三. 深度学习之图像识别: 核心技术与案例实战[M]. 北京: 机械工业出版社, 2019: 231-232. YAN Y S. Image recognition by deep learning: Core technologies and practices[M]. Beijing: China Machine Press, 2019: 231-232(in Chinese).
[20]	ROTTENSTEINER F, SOHN G, JUNG J, et al. The ISPRS benchmark on urban object classification and 3d building reconstruction[J]. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2012(I-3): 293-298.
[21]	BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. doi: 10.1109/TPAMI.2016.2644615
[22]	CHAI D F, NEWSAM S, HUANG J F. Aerial image semantic segmentation using DCNN predicted distance maps[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 161: 309-322. doi: 10.1016/j.isprsjprs.2020.01.023
[23]	XU Z Y, SU C, ZHANG X C. A semantic segmentation method with category boundary for land use and land cover (LULC) mapping of very-high resolution (VHR) remote sensing image[J]. International Journal of Remote Sensing, 2021, 42(8): 3146-3165. doi: 10.1080/01431161.2020.1871100
[24]	胡伟, 高博川, 黄振航, 等. 树形结构卷积神经网络优化的城区遥感图像语义分割[J]. 中国图象图形学报, 2020, 25(5): 1043-1052. doi: 10.11834/jig.190324 HU W, GAO B C, HUANG Z H, et al. Semantic segmentation of urban remote sensing image based on optimized tree structure convolutional neural network[J]. Journal of Image and Graphics, 2020, 25(5): 1043-1052(in Chinese). doi: 10.11834/jig.190324
[25]	KINGMA D, BA J. Adam: A method for stochastic optimization[C]// International Conference on Learning Representations (ICLR). [S.1.]: ICLR, 2015.
[26]	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. doi: 10.1109/TPAMI.2017.2699184
[27]	LI H F, QIU K J, CHEN L, et al. SCAttNet: Semantic segmentation network with spatial and channel attention mechanism for high-resolution remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2021, 18(5): 905-909.