基于倒置残差注意力的无人机航拍图像小目标检测

刘树东; 刘业辉; 孙叶美; 李懿霏; 王娇

doi:10.13700/j.bh.1001-5965.2022.0362

基于倒置残差注意力的无人机航拍图像小目标检测

doi: 10.13700/j.bh.1001-5965.2022.0362

天津城建大学计算机与信息工程学院，天津 300384

基金项目: 天津市科技计划(20YDTPJC01310)

详细信息

作者简介:
刘树东，等：基于倒置残差注意力的无人机航拍图像小目标检测

通讯作者:
E-mail：wangjiaoq@163.com

中图分类号: TP391.4
计量
- 文章访问数: 545
- HTML全文浏览量: 150
- PDF下载量: 128
- 被引次数: 0
出版历程
- 收稿日期: 2022-05-16
- 录用日期: 2022-08-19
- 网络出版日期: 2022-10-18
- 整期出版日期: 2023-03-30

Small object detection in UAV aerial images based on inverted residual attention

School of Computer and Information Engineering，Tianjin Chengjian University，Tianjin 300384，China

Funds: Science and Technology Program of Tianjin (20YDTPJC01310)

More Information

Corresponding author: E-mail：wangjiaoq@163.com

摘要

摘要:
针对无人机航拍图像背景复杂、小尺寸目标较多等问题，提出了一种基于倒置残差注意力的无人机航拍图像小目标检测算法。在主干网络部分嵌入倒置残差模块与倒置残差注意力模块，利用低维向高维的特征信息映射，获得丰富的小目标空间信息和深层语义信息，提升小目标的检测精度；在特征融合部分设计多尺度特征融合模块，融合浅层空间信息和深层语义信息，并生成4个不同感受野的检测头，提升模型对小尺寸目标的识别能力，减少小目标的漏检；设计马赛克混合数据增强方法，建立数据之间的线性关系，增加图像背景复杂度，提升算法的鲁棒性。在VisDrone数据集上的实验结果表明：所提模型的平均精度均值比DSHNet模型提升了1.2%，有效改善了无人机航拍图像小目标漏检、误检的问题。
- 目标检测 /
- 无人机图像 /
- 倒置残差 /
- 注意力 /
- 多尺度特征融合
Abstract:
Aiming at the problems of complex background and too many small-size targets in UAV aerial images, a small target detection algorithm based on inverted residual attention is proposed. Firstly, an inverted residual module and an inverted residual attention module are embedded into the backbone network, while rich spatial information and deep semantic information of small targets are obtained by feature information mapping from low dimension to high dimension, thus improving the accuracy of small target detection; Secondly, in feature fusion, a multi-scale feature fusion module is established to fuse the shallow spatial information and deep semantic information, and to generate four detection heads with different sensory fields, which improves the recognition of small-size targets and reduces missed detection of small targets; Finally, a mosaic mixed data enhancement method is designed to establish the linear relationship between the data, increase the complexity of the image background and improve the robustness of the algorithm. The experimental results on data set VisDrone show that the mean average precision of this algorithm is 1.2% higher than that of DSHNet, which means that the proposed algorithm could effectively reduce missed detection and false detection of small targets in UAV aerial images.
- object detection /
- UAV images /
- inverted residual /
- attention /
- multi-scale feature fusion

HTML全文

图 1 基于倒置残差注意力的无人机航拍图像小目标检测模型结构

Figure 1. Structure of small object detection in UAV aerial image based on inverted residual attention

下载: 全尺寸图片幻灯片

图 2 IRC3模块

Figure 2. IRC3 module

下载: 全尺寸图片幻灯片

图 3 IRAC3模块

Figure 3. IRAC3 module

下载: 全尺寸图片幻灯片

图 4 深度可分离卷积模块

Figure 4. Depthwise separable convolution module

下载: 全尺寸图片幻灯片

图 5 ECA-Net模块

Figure 5. ECA-Net module

下载: 全尺寸图片幻灯片

图 6 无人机航拍图像

Figure 6. UAV aerial image

下载: 全尺寸图片幻灯片

图 7 目标分布图像

Figure 7. Object distribution image

下载: 全尺寸图片幻灯片

图 8 融合增强方法过程

Figure 8. Fusion enhancement method process

下载: 全尺寸图片幻灯片

图 9 不同模型检测结果

Figure 9. Detection results of different models

下载: 全尺寸图片幻灯片

图 10 三种模型检测结果

Figure 10. Detection results of three models

下载: 全尺寸图片幻灯片

图 11 本文模型在不同背景下的检测结果

Figure 11. Detection results of the proposed model under different backgrounds

下载: 全尺寸图片幻灯片

表 1 不同模型的客观指标对比

Table 1. Comparison of objective indicators of different models

模型	mAP/%	mAP0.5/%	mAP0.75/%	参数量/10⁶	检测速度/FPS
YOLOv5x	23.4	35.7	25.1	83.2	41.3
模型 1	24.6	38.6	26.2	86.7	28.7
模型 2	25.4	39.7	27.1	85.5	28.7
模型 3	26.8	41.4	28.8	69.3	25.6
模型 4	27.4	42.4	29.0	72.5	23.4

下载: 导出CSV

表 2 不同算法的检测结果对比

Table 2. Comparison of detection results of different algorithms

算法	backbone	mAP/%	AP/%
算法	backbone	mAP/%	Pedestrian	Person	Bicycle	Car	Van	Truck	Tricycle	Awning-tricycle	Bus	Motor
RetinaNet	R50	13.9	13.0	7.9	1.4	45.5	19.9	11.5	6.3	4.2	17.8	11.8
Faster R-CNN	X101	22.4	21.3	15.5	7.9	52.0	29.5	20.5	14.7	8.9	32.1	21.6
Cascade R-CNN	R50	23.2	22.2	14.8	7.6	54.6	31.5	21.6	14.8	8.6	34.9	21.4
Faster R-CNN+MMF	R50	22.6	21.6	15.3	9.6	51.5	28.5	20.4	15.9	7.5	33.7	21.6
Faster R-CNN+SimCal	R50	20.0	18.7	13.8	5.7	51.0	28.4	16.4	13.6	5.9	27.0	19.4
Faster R-CNN +BGS	R50	23.0	21.8	16.0	8.1	51.8	31.1	19.8	15.0	8.4	36.1	21.5
RetinaNet+DSHNet	R50	16.1	14.1	8.9	1.3	48.2	24.8	14.2	8.8	6.0	21.6	13.1
Faster R-CNN+DSHNet	R50	24.6	22.5	16.5	10.1	52.8	32.6	22.1	17.5	8.8	39.5	23.7
Faster R-CNN+DSHNet	X101	25.8	23.3	16.7	11.4	53.7	33.1	23.8	19.5	11.1	40.0	25.5
Cascade R-CNN+DSHNet	R50	26.2	23.2	16.1	11.2	55.5	33.5	25.2	19.1	10.0	43.0	25.1
本文模型	CSPDarknet53	27.4	28.9	6.0	9.5	60.1	36.2	34.5	16.7	17.2	47.2	18.0

下载: 导出CSV

表 3 不同算法的平均精度均值与检测速度结果对比

Table 3. Comparison of average accuracy and detection speed of different algorithms

算法	backbone	mAP/%	检测速度/FPS
RetinaNet+DSHNet	R50	16.1	19
Faster R-CNN+DSHNet	R50	24.6	22.5
Cascade R-CNN+DSHNet	R50	26.2	15
本文模型	CSPDarknet53	27.4	23.4

下载: 导出CSV

参考文献(29)

[1]	WU X, LI W, HONG D, et al. Deep learning for UAV-based object detection and tracking: A survey[EB/OL]. (2021-10-25)[2022-05-01].https://arxiv.org/abs/2110.12638.
[2]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 580-587.
[3]	GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1440-1448.
[4]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Piscataway: IEEE Press, 2015: 91-99.
[5]	HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2961-2969.
[6]	CAI Z, VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6154-6162.
[7]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 779-788.
[8]	REDMON J, FARHADI A. YOLO9000: Better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 7263-7271.
[9]	REDMON J, FARHADI A. YOLOv3: An incremental improvement[EB/OL]. (2018-04-08)[2022-05-01].https://arxiv.org/abs/1804.02767.
[10]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: Optimal speed and accuracy of object detection[EB/OL]. (2020-04-23)[2022-05-01].https://arxiv.org/abs/2004. 10934?sid=NDAqzT.
[11]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.
[12]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2980-2988.
[13]	LIU M, WANG X, ZHOU A, et al. UAV-YOLO: Small object detection on unmanned aerial vehicle perspective[J]. Sensors, 2020, 20(8): 2238. doi: 10.3390/s20082238
[14]	LIANG X, ZHANG J, ZHUO L, et al. Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 30(6): 1758-1770.
[15]	ZHANG P, ZHONG Y, LI X. SlimYOLOv3: Narrower, faster and better for real-time UAV applications[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 37-45.
[16]	裴伟, 许晏铭, 朱永英, 等. 改进的 SSD 航拍目标检测方法[J]. 软件学报, 2019, 30(3): 738-758. PEI W, XU Y M, ZHU Y Y, et al. The target detection method of aerial photography images with improved SSD[J]. Journal of Software, 2019, 30(3): 738-758(in Chinese).
[17]	刘婷婷, 苗华, 李琳, 等. 融合场景上下文的轻量级目标检测网络[J]. 激光与光电子学进展, 2021, 58(20): 127-135. LIU T T, MIAO H, LI L, et al. Lightweight target detection network integrating scene context[J]. Laser & Optoelectronics Progress, 2021, 58(20): 127-135(in Chinese).
[18]	HOWARD A, SANDLER M, CHEN B, et al. Searching for mobileNetV3[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 1314-1324.
[19]	TAN M, LE Q. EfficientNetV2: Smaller models and faster training[EB/OL](2021-06-23)[2022-05-01].https://arxiv.org/abs/2104.00298v2.
[20]	刘艳菊, 王秋霁, 赵开峰, 等. 基于卷积神经网络的热轧钢条表面实时缺陷检测[J]. 仪器仪表学报, 2021, 42(12): 211-219. doi: 10.19650/j.cnki.cjsi.J2108078 LIU Y J, WANG Q J, ZHAO K F, et al. Real-time defect detection of hot rolling steel bar based on convolution neural network[J]. Chinese Journal of Scientific Instrument, 2021, 42(12): 211-219(in Chinese). doi: 10.19650/j.cnki.cjsi.J2108078
[21]	周中, 张俊杰, 龚琛杰, 等. 基于深度语义分割的隧道渗漏水智能识别[J]. 岩石力学与工程学报, 2022, 41(10): 2082-2093. doi: 10.13722/j.cnki.jrme.2022.0016 ZHOU Z, ZHANG J J, GONG C J, et al. Automatic identification of tunnel leakage based on deep semantic segmentation[J]. Chinese Journal of Rock Mechanics and Engineering, 2022, 41(10): 2082-2093(in Chinese). doi: 10.13722/j.cnki.jrme.2022.0016
[22]	WANG Q, WU B, ZHU P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 11534-11542.
[23]	HOWARD A G, ZHU M, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17)[2022-05-01].https://arxiv.org/abs/1704.04861.
[24]	HUANG G, SUN Y, LIU Z, et al. Deep networks with stochastic depth[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 646-661.
[25]	ZHANG H, CISSE M, DAUPHIN Y N, et al. Mixup: Beyond empirical risk minimization[EB/OL]. (2018-04-27)[2022-05-01].https://arxiv.org/abs/1710.09412.
[26]	ZHANG X, IZQUIERDO E, CHANDRAMOULI K. Dense and small object detection in UAV vision based on cascade network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 118-126.
[27]	WANG T, LI Y, KANG B, et al. The devil is in classification: A simple framework for long-tail instance segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 728-744.
[28]	LI Y, WANG T, KANG B, et al. Overcoming classifier imbalance for long-tail object detection with balanced group softmax[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 10991-11000.
[29]	YU W P, YANG T J N, CHEN C. Towards resolving the challenge of long-tail distribution in UAV images for object detection[C]//Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2021: 3258-3267.