双注意力引导的U-Net++遥感图像语义分割模型

刘春娟; 辛钰强; 吴小所; 闫浩文

doi:10.13700/j.bh.1001-5965.2024.0122

双注意力引导的U-Net++遥感图像语义分割模型

doi: 10.13700/j.bh.1001-5965.2024.0122

1.
兰州交通大学电子与信息工程学院，兰州 730070
2.
兰州交通大学测绘与地理信息学院，兰州 730070

基金项目:

国家重点研发计划(2022YFB3903604)；甘肃省重点研发计划(20YF8GA035)；甘肃省自然科学基金(22JR5RA320)；兰州交通大学青年科学基金(2021029)

详细信息

通讯作者:
E-mail：wuxs_laser@lzjtu.edu.cn

中图分类号: TP751
计量
- 文章访问数: 657
- HTML全文浏览量: 265
- PDF下载量: 79
- 被引次数: 0
出版历程
- 收稿日期: 2024-03-04
- 录用日期: 2024-04-08
- 网络出版日期: 2024-06-27
- 整期出版日期: 2026-05-26

Semantic segmentation model for remote sensing images based on U-Net++ guided by dual attention

1.
School of Electronic and Information Engineering，Lanzhou Jiaotong University，Lanzhou 730070，China
2.
College of Surveying and Geo-informatics，Lanzhou Jiaotong University，Lanzhou 730070，China

Funds:

National Key Research and Development Program of China (2022YFB3903604); Key R & D Projects in Gansu Province (20YF8GA035); Gansu Provincial Natural Science Foundation (22JR5RA320); Lanzhou Jiaotong University Youth Science Fund (2021029)

More Information

Corresponding author: E-mail：wuxs_laser@lzjtu.edu.cn

摘要

摘要:
利用语义分割算法为遥感图像中的像素赋予地物类别标签是遥感图像智能解译中的重要内容。针对高分辨率遥感图像中不同地物类别之间尺度差异大且场景复杂导致的物体边缘分割不完整、小尺度物体分割精度低的问题，提出双注意力引导的U-Net++语义分割模型。在网络的编码阶段构建双分支骨干网络提取特征，利用互注意力捕捉不同尺度特征图像素之间的依赖关系，自适应地融合相同网络深度的不同尺度特征，提升对小尺度物体的关注度；在网络的解码阶段引入空间与通道混合的注意力机制，缩小不同深度子解码器输出之间的语义差距，同时融合其中不同层次的语义信息和空间位置表征，解决复杂场景下精细分割的问题。实验结果表明：所提算法在Potsdam数据集与Vaihingen数据集上的平均交并比(mIoU)分别达到了86.77%与82.73%，F₁分数的均值分别达到了92.32%与90.79%，整体性能显著优于U-Net++、FarSeg、DMAU-Net、SAPNet等对比算法，且对小尺度物体的分割性能有明显提升。
- 遥感图像 /
- 语义分割 /
- U-Net++ /
- 注意力机制 /
- 小尺度物体
Abstract:
An essential component of the intelligent interpretation of remote sensing images is the use of semantic segmentation algorithms to assign feature class labels to individual pixels. Aiming at the problem of low segmentation accuracy of deep neural networks for small-scale objects caused by the large scale difference between different categories of objects in high-resolution remote sensing images, a U-Net++ guided by dual attention semantic segmentation model is proposed in this paper. In the encoding stage of the network, a dual parallel backbone network is constructed to extract features, and mutual attention is utilized to capture the dependencies between pixels of feature maps of different scales, adaptively fusing features of different scales with the same network depth to enhance the attention to small-scale objects. To address the issue of fine segmentation in complex scenes, a spatial and channel hybrid attention mechanism is introduced in the network’s decoding stage to reduce the semantic gap between the outputs of various depth sub-decoders while fusing the semantic information and spatial location representations at various levels therein. The proposed algorithm achieves notable performance metrics, with the mean intersection over union (mIoU) values of 86.77% and 82.73% on the Potsdam dataset and Vaihingen dataset, respectively, accompanied by the mean F₁-score of 92.32% and 90.79%. These results underscore the algorithm’s efficacy in delivering comprehensive segmentation of small scale objects, surpassing the performance of other state-of-the-art semantic segmentation algorithms such as U-Net++, FarSeg, DMAU-Net, and SAPNet.
- remote sensing image /
- semantic segmentation /
- U-Net++ /
- attention mechanism /
- small scale objects

HTML全文

图 1 双注意力引导的U-Net++网络

Figure 1. U-Net++ guided by dual attention

下载: 全尺寸图片幻灯片

图 2 卷积块结构

Figure 2. Convolution block architecture

下载: 全尺寸图片幻灯片

图 3 多尺度互注意力模块

Figure 3. Multi-scale mutual attention module

下载: 全尺寸图片幻灯片

图 4 全局-局部注意力融合模块

Figure 4. Global-local attention fusion module

下载: 全尺寸图片幻灯片

图 5 Potsdam数据集的消融实验定性比较结果

Figure 5. Qualitative comparison results of ablation experiments on Potsdam dataset

下载: 全尺寸图片幻灯片

图 6 Vaihingen数据集的消融实验定性比较结果

Figure 6. Qualitative comparison results of ablation experiments on Vaihingen dataset

下载: 全尺寸图片幻灯片

图 7 Potsdam数据集的对比实验定性比较结果

Figure 7. Qualitative comparison results of comparative experiments on Potsdam dataset

下载: 全尺寸图片幻灯片

图 8 Vaihingen数据集的对比实验定性比较结果

Figure 8. Qualitative comparison results of comparative experiments on Vaihingen dataset

下载: 全尺寸图片幻灯片

表 1 消融实验策略描述

Table 1. Strategy description for ablation experiments

模型	策略描述
Baseline	骨干网络为Resnet18的U-Net++网络
Baseline + MMA	在Baseline基础上增加双分支骨干网络输入及MMA模块
Baseline + GLAM	在Baseline基础上增加GLAM模块
DAU-Net++(Baseline + MMA + GLAM)	在Baseline基础上增加双分支骨干网络输入、MMA模块、GLAM模块

下载: 导出CSV

表 2 Potsdam数据集的消融实验定量评估结果

Table 2. Quantitative assessment results of ablation experiments on Potsdam dataset

网络模型	IoU/%					mIoU/%	OA/%	mF1/%
网络模型	不透水表面	建筑物	低矮植被	树	汽车	mIoU/%	OA/%	mF1/%
Baseline	84.37	87.02	79.97	68.57	69.05	80.36	88.63	87.32
Baseline + MMA	86.04	89.91	83.92	75.37	76.21	84.35	90.84	91.44
Baseline + GLAM	85.87	90.71	84.64	76.19	75.39	84.66	91.21	91.73
DAU-Net++	87.72	91.85	86.49	78.21	77.42	86.77	91.97	92.32
注：表中加粗字体表示各项指标中的最优数据，下划线标出数据为各项指标的次优数据。

下载: 导出CSV

表 3 Vaihingen数据集的消融实验定量评估结果

Table 3. Quantitative assessment results of ablation experiments on Vaihingen dataset

网络模型	IoU/%					mIoU/%	OA/%	mF1/%
网络模型	不透水表面	建筑物	低矮植被	树	汽车	mIoU/%	OA/%	mF1/%
Baseline	88.06	89.64	69.82	78.17	66.02	78.49	89.66	86.58
Baseline + MMA	90.86	93.22	75.99	81.64	73.40	80.88	90.85	89.29
Baseline + GLAM	91.15	93.41	75.61	83.95	71.68	81.03	91.04	88.84
DAU-Net++	92.26	94.51	77.44	85.53	75.31	82.73	91.68	90.79
注：表中加粗字体表示各项指标中的最优数据，下划线标出数据为各项指标的次优数据。

下载: 导出CSV

表 4 Potsdam数据集的对比实验定量评估结果

Table 4. Quantitative assessment results of comparative experiments on Potsdam dataset

网络模型	IoU/%					mIoU/%	OA/%	mF1/%	浮点运算速度/10⁹ s⁻¹
网络模型	不透水表面	建筑物	低矮植被	树	汽车	mIoU/%	OA/%	mF1/%	浮点运算速度/10⁹ s⁻¹
PSPNet^[5]	80.78	85.32	78.55	56.21	65.84	76.17	88.32	85.79	67.95
U-Net^[10]	81.63	84.51	76.57	60.38	67.57	76.75	87.69	85.26	63.59
U-Net++^[11]	84.37	87.02	79.97	68.57	69.05	80.36	88.63	87.32	90.16
EMANet^[25]	83.64	90.38	81.15	72.72	71.60	81.87	89.38	88.59	126.49
FarSeg^[26]	84.39	89.63	80.95	74.62	70.46	82.55	89.13	87.89	193.23
DA-IMRN^[21]	87.86	90.21	81.62	74.53	74.39	84.96	90.24	89.73	167.68
MACU-Net^[27]	86.64	90.36	80.69	76.58	73.37	84.76	90.18	90.86	213.58
DMAU-Net^[22]	88.89	92.03	84.94	76.39	75.46	85.68	91.40	91.15	186.74
SAPNet^[23]	88.36	92.14	83.91	78.52	76.31	85.73	91.24	91.68	317.94
DAU-Net++	87.72	91.85	86.49	78.21	77.42	86.77	91.97	92.32	257.41
注：表中加粗字体表示各项指标中的最优数据，下划线标出数据为各项指标的次优数据。

下载: 导出CSV

表 5 Vaihingen数据集的对比实验定量评估结果

Table 5. Quantitative assessment results of comparative experiments on Vaihingen dataset

网络模型	IoU/%					mIoU/%	OA/%	mF1/%
网络模型	不透水表面	建筑物	低矮植被	树	汽车	mIoU/%	OA/%	mF1/%
PSPNet^[5]	85.30	86.44	62.53	76.44	57.39	73.78	86.55	83.18
U-Net^[10]	84.76	87.49	63.28	74.81	59.36	72.18	85.79	83.24
U-Net++^[11]	88.06	89.64	69.82	78.17	66.02	78.49	89.66	86.58
EMANet^[25]	89.76	92.21	72.57	81.63	72.08	80.37	90.63	88.26
FarSeg^[26]	90.58	91.99	73.08	82.79	71.02	79.62	89.98	88.67
DA-IMRN^[21]	88.94	93.61	74.56	82.61	72.83	80.89	90.47	89.45
MACU-Net^[27]	92.21	93.29	73.82	82.95	73.28	81.42	90.58	89.53
DMAU-Net^[22]	92.74	95.26	76.51	83.79	73.64	81.38	90.75	89.96
SAPNet^[23]	92.58	94.62	75.16	84.16	74.24	81.69	91.08	89.44
DAU-Net++	92.26	94.51	77.44	85.53	75.31	82.73	91.68	90.79
注：表中加粗字体表示各项指标中的最优数据，下划线标出数据为各项指标的次优数据。

下载: 导出CSV

参考文献(27)

[1]	杨军, 张金影. 嵌入自注意力机制的U型高分遥感影像语义分割网络[J]. 北京航空航天大学学报, 2025, 51(5): 1514-1527. YANG J, ZHANG J Y. U-shaped semantic segmentation network of high-resolution remote sensing images embedded with self-attention mechanism[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(5): 1514-1527(in Chinese).
[2]	吴云华, 张泽中, 华冰, 等. 应用卷积神经网络的遥感图像云层自主检测[J]. 哈尔滨工业大学学报, 2020, 52(12): 27-34. WU Y H, ZHANG Z Z, HUA B, et al. Autonomous cloud detection for remote sensing images using convolutional neural network[J]. Journal of Harbin Institute of Technology, 2020, 52(12): 27-34(in Chinese).
[3]	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3431-3440.
[4]	CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. (2017-06-17)[2024-03-01]. https://arxiv.org/abs/1706.05587.
[5]	ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6230-6239.
[6]	WANG X, LI Z S, HUANG Y P, et al. Multimodal medical image segmentation using multi-scale context-aware network[J]. Neurocomputing, 2022, 486: 135-146.
[7]	DOU F R, ZHANG C F, HU D, et al. EASNet: a multiscale attention semantic segmentation network combined with asymmetric convolution[J]. Journal of Electronic Imaging, 2022, 31(4): 043034.
[8]	LUO J, ZHAO L, ZHU L, et al. Multi-scale receptive field fusion network for lightweight image super-resolution[J]. Neurocomputing, 2022, 493: 314-326.
[9]	LIN D, SHEN D G, SHEN S T, et al. ZigZagNet: fusing top-down and bottom-up context for object segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 7482-7491.
[10]	RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2015: 234-241.
[11]	ZHOU Z W, SIDDIQUEE M M R, TAJBAKHSH N, et al. U-Net++: a nested U-Net architecture for medical image segmentation[C]//Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Berlin: Springer, 2018: 3-11.
[12]	CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 833-851.
[13]	HE X, ZHOU Y, ZHAO J Q, et al. Swin Transformer embedding U-Net for remote sensing image semantic segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 4408715.
[14]	CUI W, WANG F, HE X, et al. Multi-scale semantic segmentation and spatial relationship recognition of remote sensing images based on an attention model[J]. Remote Sensing, 2019, 11(9): 1044.
[15]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 3-19.
[16]	QI X Q, LI K Q, LIU P K, et al. Deep attention and multi-scale networks for accurate remote sensing image segmentation[J]. IEEE Access, 2020, 8: 146627-146639.
[17]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141.
[18]	FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 3141-3149.
[19]	HUANG Z L, WANG X G, WEI Y C, et al. CCNet: criss-crossattention for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 6896-6908.
[20]	OKTAY O, SCHLEMPER J, LE FOLGOC L, et al. Attention U-Net: learning where to look for the pancreas[EB/OL]. (2018-05-20) [2024-03-01]. https://arxiv.org/abs/1804.03999.
[21]	ZOU L, ZHANG Z F, DU H J, et al. DA-IMRN: dual-attention-guided interactive multi-scale residual network for hyperspectral image classification[J]. Remote Sensing, 2022, 14(3): 530.
[22]	YANG Y, DONG J W, WANG Y H, et al. DMAU-Net: an attention-based multiscale max-pooling dense network for the semantic segmentation in VHR remote-sensing images[J]. Remote Sensing, 2023, 15(5): 1328.
[23]	LI X, XU F, LIU F, et al. A synergistical attention model for semantic segmentation of remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5400916.
[24]	刘春娟, 乔泽, 闫浩文, 等. 基于双路径监督的遥感图像语义分割网络[J]. 北京航空航天大学学报, 2025, 51(3): 732-741. LIU C J, QIAO Z, YAN H W, et al. Semantic segmentation network of remote sensing images based on dual path supervision[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(3): 732-741(in Chinese).
[25]	LI X, ZHONG Z S, WU J L, et al. Expectation-maximization attention networks for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 9166-9175.
[26]	ZHENG Z, ZHONG Y F, WANG J J, et al. Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4095-4104.
[27]	LI R, DUAN C X, ZHENG S Y, et al. MACU-Net for semantic segmentation of fine-resolution remotely sensed images[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 8007205.