Semantic segmentation model for remote sensing images based on U-Net++ guided by dual attention
-
摘要:
利用语义分割算法为遥感图像中的像素赋予地物类别标签是遥感图像智能解译中的重要内容。针对高分辨率遥感图像中不同地物类别之间尺度差异大且场景复杂导致的物体边缘分割不完整、小尺度物体分割精度低的问题,提出双注意力引导的U-Net++语义分割模型。在网络的编码阶段构建双分支骨干网络提取特征,利用互注意力捕捉不同尺度特征图像素之间的依赖关系,自适应地融合相同网络深度的不同尺度特征,提升对小尺度物体的关注度;在网络的解码阶段引入空间与通道混合的注意力机制,缩小不同深度子解码器输出之间的语义差距,同时融合其中不同层次的语义信息和空间位置表征,解决复杂场景下精细分割的问题。实验结果表明:所提算法在Potsdam数据集与Vaihingen数据集上的平均交并比(mIoU)分别达到了86.77%与82.73%,
F 1分数的均值分别达到了92.32%与90.79%,整体性能显著优于U-Net++、FarSeg、DMAU-Net、SAPNet等对比算法,且对小尺度物体的分割性能有明显提升。Abstract:An essential component of the intelligent interpretation of remote sensing images is the use of semantic segmentation algorithms to assign feature class labels to individual pixels. Aiming at the problem of low segmentation accuracy of deep neural networks for small-scale objects caused by the large scale difference between different categories of objects in high-resolution remote sensing images, a U-Net++ guided by dual attention semantic segmentation model is proposed in this paper. In the encoding stage of the network, a dual parallel backbone network is constructed to extract features, and mutual attention is utilized to capture the dependencies between pixels of feature maps of different scales, adaptively fusing features of different scales with the same network depth to enhance the attention to small-scale objects. To address the issue of fine segmentation in complex scenes, a spatial and channel hybrid attention mechanism is introduced in the network’s decoding stage to reduce the semantic gap between the outputs of various depth sub-decoders while fusing the semantic information and spatial location representations at various levels therein. The proposed algorithm achieves notable performance metrics, with the mean intersection over union (mIoU) values of 86.77% and 82.73% on the Potsdam dataset and Vaihingen dataset, respectively, accompanied by the mean
F 1-score of 92.32% and 90.79%. These results underscore the algorithm’s efficacy in delivering comprehensive segmentation of small scale objects, surpassing the performance of other state-of-the-art semantic segmentation algorithms such as U-Net++, FarSeg, DMAU-Net, and SAPNet.-
Key words:
- remote sensing image /
- semantic segmentation /
- U-Net++ /
- attention mechanism /
- small scale objects
-
表 1 消融实验策略描述
Table 1. Strategy description for ablation experiments
模型 策略描述 Baseline 骨干网络为Resnet18的U-Net++网络 Baseline + MMA 在Baseline基础上增加双分支骨干网络输入及MMA模块 Baseline + GLAM 在Baseline基础上增加GLAM模块 DAU-Net++(Baseline + MMA + GLAM) 在Baseline基础上增加双分支骨干网络输入、MMA模块、GLAM模块 表 2 Potsdam数据集的消融实验定量评估结果
Table 2. Quantitative assessment results of ablation experiments on Potsdam dataset
网络模型 IoU/% mIoU/% OA/% mF1/% 不透水表面 建筑物 低矮植被 树 汽车 Baseline 84.37 87.02 79.97 68.57 69.05 80.36 88.63 87.32 Baseline + MMA 86.04 89.91 83.92 75.37 76.21 84.35 90.84 91.44 Baseline + GLAM 85.87 90.71 84.64 76.19 75.39 84.66 91.21 91.73 DAU-Net++ 87.72 91.85 86.49 78.21 77.42 86.77 91.97 92.32 注:表中加粗字体表示各项指标中的最优数据,下划线标出数据为各项指标的次优数据。 表 3 Vaihingen数据集的消融实验定量评估结果
Table 3. Quantitative assessment results of ablation experiments on Vaihingen dataset
网络模型 IoU/% mIoU/% OA/% mF1/% 不透水表面 建筑物 低矮植被 树 汽车 Baseline 88.06 89.64 69.82 78.17 66.02 78.49 89.66 86.58 Baseline + MMA 90.86 93.22 75.99 81.64 73.40 80.88 90.85 89.29 Baseline + GLAM 91.15 93.41 75.61 83.95 71.68 81.03 91.04 88.84 DAU-Net++ 92.26 94.51 77.44 85.53 75.31 82.73 91.68 90.79 注:表中加粗字体表示各项指标中的最优数据,下划线标出数据为各项指标的次优数据。 表 4 Potsdam数据集的对比实验定量评估结果
Table 4. Quantitative assessment results of comparative experiments on Potsdam dataset
网络模型 IoU/% mIoU/% OA/% mF1/% 浮点运算速度/109 s−1 不透水表面 建筑物 低矮植被 树 汽车 PSPNet[5] 80.78 85.32 78.55 56.21 65.84 76.17 88.32 85.79 67.95 U-Net[10] 81.63 84.51 76.57 60.38 67.57 76.75 87.69 85.26 63.59 U-Net++[11] 84.37 87.02 79.97 68.57 69.05 80.36 88.63 87.32 90.16 EMANet[25] 83.64 90.38 81.15 72.72 71.60 81.87 89.38 88.59 126.49 FarSeg[26] 84.39 89.63 80.95 74.62 70.46 82.55 89.13 87.89 193.23 DA-IMRN[21] 87.86 90.21 81.62 74.53 74.39 84.96 90.24 89.73 167.68 MACU-Net[27] 86.64 90.36 80.69 76.58 73.37 84.76 90.18 90.86 213.58 DMAU-Net[22] 88.89 92.03 84.94 76.39 75.46 85.68 91.40 91.15 186.74 SAPNet[23] 88.36 92.14 83.91 78.52 76.31 85.73 91.24 91.68 317.94 DAU-Net++ 87.72 91.85 86.49 78.21 77.42 86.77 91.97 92.32 257.41 注:表中加粗字体表示各项指标中的最优数据,下划线标出数据为各项指标的次优数据。 表 5 Vaihingen数据集的对比实验定量评估结果
Table 5. Quantitative assessment results of comparative experiments on Vaihingen dataset
网络模型 IoU/% mIoU/% OA/% mF1/% 不透水表面 建筑物 低矮植被 树 汽车 PSPNet[5] 85.30 86.44 62.53 76.44 57.39 73.78 86.55 83.18 U-Net[10] 84.76 87.49 63.28 74.81 59.36 72.18 85.79 83.24 U-Net++[11] 88.06 89.64 69.82 78.17 66.02 78.49 89.66 86.58 EMANet[25] 89.76 92.21 72.57 81.63 72.08 80.37 90.63 88.26 FarSeg[26] 90.58 91.99 73.08 82.79 71.02 79.62 89.98 88.67 DA-IMRN[21] 88.94 93.61 74.56 82.61 72.83 80.89 90.47 89.45 MACU-Net[27] 92.21 93.29 73.82 82.95 73.28 81.42 90.58 89.53 DMAU-Net[22] 92.74 95.26 76.51 83.79 73.64 81.38 90.75 89.96 SAPNet[23] 92.58 94.62 75.16 84.16 74.24 81.69 91.08 89.44 DAU-Net++ 92.26 94.51 77.44 85.53 75.31 82.73 91.68 90.79 注:表中加粗字体表示各项指标中的最优数据,下划线标出数据为各项指标的次优数据。 -
[1] 杨军, 张金影. 嵌入自注意力机制的U型高分遥感影像语义分割网络[J]. 北京航空航天大学学报, 2025, 51(5): 1514-1527.YANG J, ZHANG J Y. U-shaped semantic segmentation network of high-resolution remote sensing images embedded with self-attention mechanism[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(5): 1514-1527(in Chinese). [2] 吴云华, 张泽中, 华冰, 等. 应用卷积神经网络的遥感图像云层自主检测[J]. 哈尔滨工业大学学报, 2020, 52(12): 27-34.WU Y H, ZHANG Z Z, HUA B, et al. Autonomous cloud detection for remote sensing images using convolutional neural network[J]. Journal of Harbin Institute of Technology, 2020, 52(12): 27-34(in Chinese). [3] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3431-3440. [4] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. (2017-06-17)[2024-03-01]. https://arxiv.org/abs/1706.05587. [5] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6230-6239. [6] WANG X, LI Z S, HUANG Y P, et al. Multimodal medical image segmentation using multi-scale context-aware network[J]. Neurocomputing, 2022, 486: 135-146. [7] DOU F R, ZHANG C F, HU D, et al. EASNet: a multiscale attention semantic segmentation network combined with asymmetric convolution[J]. Journal of Electronic Imaging, 2022, 31(4): 043034. [8] LUO J, ZHAO L, ZHU L, et al. Multi-scale receptive field fusion network for lightweight image super-resolution[J]. Neurocomputing, 2022, 493: 314-326. [9] LIN D, SHEN D G, SHEN S T, et al. ZigZagNet: fusing top-down and bottom-up context for object segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 7482-7491. [10] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2015: 234-241. [11] ZHOU Z W, SIDDIQUEE M M R, TAJBAKHSH N, et al. U-Net++: a nested U-Net architecture for medical image segmentation[C]//Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Berlin: Springer, 2018: 3-11. [12] CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 833-851. [13] HE X, ZHOU Y, ZHAO J Q, et al. Swin Transformer embedding U-Net for remote sensing image semantic segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 4408715. [14] CUI W, WANG F, HE X, et al. Multi-scale semantic segmentation and spatial relationship recognition of remote sensing images based on an attention model[J]. Remote Sensing, 2019, 11(9): 1044. [15] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 3-19. [16] QI X Q, LI K Q, LIU P K, et al. Deep attention and multi-scale networks for accurate remote sensing image segmentation[J]. IEEE Access, 2020, 8: 146627-146639. [17] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141. [18] FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 3141-3149. [19] HUANG Z L, WANG X G, WEI Y C, et al. CCNet: criss-crossattention for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 6896-6908. [20] OKTAY O, SCHLEMPER J, LE FOLGOC L, et al. Attention U-Net: learning where to look for the pancreas[EB/OL]. (2018-05-20) [2024-03-01]. https://arxiv.org/abs/1804.03999. [21] ZOU L, ZHANG Z F, DU H J, et al. DA-IMRN: dual-attention-guided interactive multi-scale residual network for hyperspectral image classification[J]. Remote Sensing, 2022, 14(3): 530. [22] YANG Y, DONG J W, WANG Y H, et al. DMAU-Net: an attention-based multiscale max-pooling dense network for the semantic segmentation in VHR remote-sensing images[J]. Remote Sensing, 2023, 15(5): 1328. [23] LI X, XU F, LIU F, et al. A synergistical attention model for semantic segmentation of remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5400916. [24] 刘春娟, 乔泽, 闫浩文, 等. 基于双路径监督的遥感图像语义分割网络[J]. 北京航空航天大学学报, 2025, 51(3): 732-741.LIU C J, QIAO Z, YAN H W, et al. Semantic segmentation network of remote sensing images based on dual path supervision[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(3): 732-741(in Chinese). [25] LI X, ZHONG Z S, WU J L, et al. Expectation-maximization attention networks for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 9166-9175. [26] ZHENG Z, ZHONG Y F, WANG J J, et al. Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4095-4104. [27] LI R, DUAN C X, ZHENG S Y, et al. MACU-Net for semantic segmentation of fine-resolution remotely sensed images[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 8007205. -


下载: