增强上下文特征交互的实时无人机影像分割算法

李云红; 张富星; 苏雪平; 李丽敏; 王梅; 梁成名

doi:10.13700/j.bh.1001-5965.2023.0830

增强上下文特征交互的实时无人机影像分割算法

doi: 10.13700/j.bh.1001-5965.2023.0830

西安工程大学电子信息学院，西安 710048

基金项目:

国家自然科学基金(62203344)；陕西省自然科学基础研究计划重点项目(2022JZ-35)；陕西高校青年创新团队；西安市“科学家+工程师”队伍建设项目(25KGYB00029)

详细信息

通讯作者:
E-mail：hitliyunhong@163.com

中图分类号: TP391.41
计量
- 文章访问数: 226
- HTML全文浏览量: 107
- PDF下载量: 148
- 被引次数: 0
出版历程
- 收稿日期: 2023-12-22
- 录用日期: 2024-03-29
- 网络出版日期: 2024-04-18
- 整期出版日期: 2026-03-31

Real-time UAV image segmentation algorithm with enhanced contextual feature interaction

School of Electronics and Information，Xi’an Polytechnic University，Xi’an 710048，China

Funds:

National Natural Science Foundation of China (62203344); Key Projects of Natural Science Basic Research Program of Shaanxi (2022JZ-35); The Youth Innovation Team of Shaanxi Universities; Xi’an City “Scientist+Engineering” Team Construction Project (25KGYB00029)

More Information

Corresponding author: E-mail：hitliyunhong@163.com

摘要

摘要:
针对无人机影像语义分割任务中轻量级算法缺乏全局信息交互导致分割结果中目标漏检与不完整问题，提出了一种增强上下文特征交互的实时无人机影像分割算法。算法采用双分支结构，利用不同方向的全局平均池化对通道和空间信息进行编码，保留了精准的位置信息，并增强了对图像中局部细节信息的关注；利用位置感知循环卷积和空间加权构建全局感知提取模块，实现了全局上下文信息捕获；对不同尺度特征采用加权操作进行融合，降低了融合过程中的信息损失与算法的计算量。在UAVid 和AeroScapes数据集上对所提算法进行验证，结果显示：平均交并比(mIoU)分别达到66.5%和63.0%，相比BiSeNet V2提升了2.6%和2.2%，分割速度分别达到79.9帧/s和71.4帧/s，相比BiSeNet V2提升了8.3帧/s和6.9帧/s，在保证实时分割速度的同时取得了较好的分割精度。
- 语义分割 /
- 无人机影像 /
- 卷积神经网络 /
- 上下文信息 /
- 特征融合
Abstract:
A real-time UAV image segmentation algorithm with enhanced contextual feature interaction is proposed to address the problem of target omission and incompleteness in segmentation results due to the lack of global information interaction in lightweight algorithms for semantic segmentation tasks of UAV images. The approach uses a two-branch structure. To encode the channel and spatial information, global average pooling in various directions was used. This preserves the correct position information and increases the attention to the local detail information in the image. Secondly, a global perceptual extraction module was constructed by using the position-aware circular convolution and spatial weighting, which achieves the global contextual information capture; Finally, the weighting operation is applied to the features of different scales for the fusion, which reduces the information loss in the fusion process and the computation of the algorithm. The UAVid and AeroScapes datasets are used to validate the algorithm. The results indicate that the mean intersection over union (mIoU) achieved 66.5% and 63.0%, respectively, marking a 2.6% and 2.2% improvement over BiSeNet V2. The segmentation speeds reached 79.9 and 71.4 frames per second, respectively, showing an increase of 8.3 and 6.9 frames per second compared to BiSeNet V2. This method ensures real-time segmentation speed while delivering satisfactory segmentation accuracy.
- semantic segmentation /
- UAV image /
- convolutional neural network /
- context information /
- feature fusion

HTML全文

图 1 模型整体结构

Figure 1. Overall structure of model

下载: 全尺寸图片幻灯片

图 2 特征增强模块

Figure 2. Feature enhancement module

下载: 全尺寸图片幻灯片

图 3 全局感知提取模块

Figure 3. Global perception extraction block

下载: 全尺寸图片幻灯片

图 4 原特征融合模块和改进特征融合模块的对比

Figure 4. Comparison of original feature fusion module and improved feature fusion module

下载: 全尺寸图片幻灯片

图 5 UAVid数据集上消融实验视觉对比结果

Figure 5. Visual comparison results of ablation experiments on UAVid dataset

下载: 全尺寸图片幻灯片

图 6 AeroScapes数据集上消融实验视觉对比结果

Figure 6. Visual comparison results of ablation experiments on AeroScape dataset

下载: 全尺寸图片幻灯片

图 7 部分算法在UAVid数据集上预测结果比较

Figure 7. Comparison of prediction results of some selected algorithms on UAVid dataset

下载: 全尺寸图片幻灯片

图 8 部分算法在AeroScapes数据集上预测结果比较

Figure 8. Comparison of prediction results of some selected algorithms on AeroScapes dataset

下载: 全尺寸图片幻灯片

表 1 UAVid数据集上各模块消融实验数据

Table 1. Data from ablation experiment of modules on UAVid dataset

Baseline	GPE模块	FEM	FFM	Swin-Block	mIoU/%	浮点运算速度/10⁹ s⁻¹	分割速度/(帧·s⁻¹)
√					63.9	14.78	71.6
√	√				64.8	12.82	69.0
√		√			65.2	11.58	74.8
√			√		64.4	10.24	82.2
√	√	√			65.7	11.79	74.1
√	√		√		65.0	12.50	74.2
√		√	√		65.4	11.24	81.7
√	√	√	√		66.5	12.93	79.9
√				√	65.3	21.65	38.3
√		√		√	66.2	20.07	42.1
√		√	√	√	66.9	19.84	45.6

下载: 导出CSV

表 2 AeroScapes数据集上各模块消融实验数据

Table 2. Data from ablation experiments of modules on AeroScapes dataset

Baseline	GPE模块	FEM	FFM	mIoU/%	分割速度/(帧·s⁻¹)
√				60.8	64.5
√	√			61.1	59.2
√		√		61.8	66.2
√			√	61.3	77.4
√	√	√		61.9	64.6
√	√		√	62.5	66.8
√		√	√	62.1	72.5
√	√	√	√	63.0	71.4

下载: 导出CSV

表 3 不同算法在UAVid数据集上实验数据比较

Table 3. Comparison of experimental data of different algorithms on UAVid dataset

模型	Backbone	mIoU/%	浮点运算速度/10⁹ s⁻¹	分割速度/(帧·s⁻¹)
SegNet^[10]		55.3	112.63	11.5
U-Net^[7]		58.4	64.01	12.6
SwiftNet^[17]	Resnet18	62.1	13.42	48.9
ICNet^[11]	PSPNet50	61.9	25.91	73.2
Deeplabv3+^[8]	MobileNetV2	61.4	20.47	71.2
BiSeNet^[13]	Resnet18	61.8	12.64	74.1
SFNet^[22]	Resnet18	65.3	18.52	62.4
BiSeNet V2^[15]		63.9	14.78	71.6
HyperSeg-S^[23]	EfficientNet-B1	64.3	10.12	27.5
PIDNet^[24]		62.0	5.07	74.8
本文		66.5	12.93	79.9

下载: 导出CSV

表 4 不同算法在AeroScapes数据集上实验数据比较

Table 4. Comparison of experimental data of different algorithms on AeroScapes dataset

模型	Backbone	参数量	mIoU/%	分割速度/(帧·s⁻¹)
SegNet^[10]		15.62×10⁶	50.6	11.8
U-Net^[7]		17.26×10⁶	51.4	11.1
SwiftNet^[17]	Resnet18	12.07×10⁶	60.1	51.1
ICNet^[11]	PSPNet50	23.15×10⁶	59.5	67.2
Deeplabv3+^[8]	MobileNetV2	5.82×10⁶	59.9	61.8
BiSeNet^[13]	Resnet18	13.62×10⁶	58.2	66.9
SFNet^[22]	Resnet18	12.72×10⁶	62.1	64.0
BiSeNet V2^[15]		5.21×10⁶	60.8	64.5
HyperSeg-S^[23]	EfficientNet-B1	10.24×10⁶	59.4	27.7
PIDNet^[24]		7.72×10⁶	60.9	66.5
本文		6.44×10⁶	63.0	71.4

下载: 导出CSV

参考文献(24)

[1]	宝音图, 刘伟, 李润生, 等. 遥感图像语义分割的空间增强注意力U型网络[J]. 北京航空航天大学学报, 2023, 49(7): 1828-1837. BAO Y T, LIU W, LI R S, et al. Semantic segmentation of remote sensing images based on U-shaped network combined with spatial enhance attention[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(7): 1828-1837(in Chinese).
[2]	吴泽康, 赵姗, 李宏伟, 等. 遥感图像语义分割空间全局上下文信息网络[J]. 浙江大学学报(工学版), 2022, 56(4): 795-802. WU Z K, ZHAO S, LI H W, et al. Spatial global context information network for semantic segmentation of remote sensing image[J]. Journal of Zhejiang University (Engineering Science), 2022, 56(4): 795-802(in Chinese).
[3]	LIU S Y, CHENG J, LIANG L K, et al. Light-weight semantic segmentation network for UAV remote sensing images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14: 8287-8296.
[4]	ZHANG G, LI Z, TANG C, et al. CEDNet: a cascade encoder-decoder network for dense prediction[J]. Pattern Recognition, 2025, 158: 111072.
[5]	OUYANG D L, HE S, ZHANG G Z, et al. Efficient multi-scale attention module with cross-spatial learning[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2023: 1-5.
[6]	YU C Q, XIAO B, GAO C X, et al. Lite-HRNet: a lightweight high-resolution network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 10435-10445.
[7]	RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2015: 234-241.
[8]	CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 833-851.
[9]	PASZKE A, CHAURASIA A, KIM S, et al. ENet: a deep neural network architecture for real-time semantic segmentation[EB/OL]. (2016-06-07)[2023-12-01]. https://arxiv.org/abs/1606.02147.
[10]	BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[11]	ZHAO H S, QI X J, SHEN X Y, et al. ICNet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 418-434.
[12]	WU T Y, TANG S, ZHANG R, et al. CGNet: a light-weight context guided network for semantic segmentation[J]. IEEE Transactions on Image Processing, 2020, 30: 1169-1179.
[13]	YU C Q, WANG J B, PENG C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 334-349.
[14]	DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style ConvNets great again[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13728-13737.
[15]	YU C Q, GAO C X, WANG J B, et al. BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11): 3051-3068.
[16]	YI S, LIU X, LI J J, et al. UAVformer: a composite transformer network for urban scene segmentation of UAV images[J]. Pattern Recognition, 2023, 133: 109019.
[17]	WOO S, DEBNATH S, HU R H, et al. Convnext v2: co-designing and scaling ConvNets with masked autoencoders[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2023: 16133-16142.
[18]	ZHANG H, WU C R, ZHANG Z Y, et al. ResNeSt: split-attention networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 2735-2745.
[19]	ORŠIC M, KREŠO I, BEVANDIC P, et al. In defense of pre-trained ImageNet architectures for real-time semantic segmentation of road-driving images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 12599-12608.
[20]	ZHANG H K, HU W Z, WANG X Y. ParC-net: position aware circular convolution with merits from ConvNets and Transformer[C]// Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2022: 613-630.
[21]	LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 9992-10002.
[22]	LI X T, YOU A S, ZHU Z, et al. Semantic flow for fast and accurate scene parsing[C]//Proceedings of the 16th European Conference on Computer Vision. Berlin: Springer, 2020: 775-793.
[23]	NIRKIN Y, WOLF L, HASSNER T. HyperSeg: patch-wise hypernetwork for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 4060-4069.
[24]	XU J C, XIONG Z X, BHATTACHARYYA S P. PIDNet: a real-time semantic segmentation network inspired by PID controllers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2023: 19529-19539.