基于特征信息补充与增强的无人机视角图像目标检测算法

邬开俊; 蒲卓

doi:10.13700/j.bh.1001-5965.2024.0190

基于特征信息补充与增强的无人机视角图像目标检测算法

doi: 10.13700/j.bh.1001-5965.2024.0190

邬开俊,
蒲卓^,

兰州交通大学电子与信息工程学院，兰州 730000

基金项目:

甘肃省自然科学基金(23JRRA913) ；内蒙古自治区重点研发与成果转化计划项目(2023YFSH0043)；兰州交通大学重点研发项目资助(ZDYF2304)

详细信息

通讯作者:
E-mail：shiyuepz@163.com

中图分类号: V279；TP391.41
计量
- 文章访问数: 410
- HTML全文浏览量: 180
- PDF下载量: 7
- 被引次数: 0
出版历程
- 收稿日期: 2024-04-01
- 录用日期: 2024-05-20
- 网络出版日期: 2024-09-05
- 整期出版日期: 2026-05-31

Object detection algorithm for UAV viewpoint images based on feature information complementation and enhancement

WU Kaijun,
PU Zhuo^,

School of Electronic and Information Engineering，Lanzhou Jiaotong University，Lanzhou 730000，China

Funds:

Natural Science Foundation of Gansu Province (23JRRA913); Inner Mongolia Autonomous Region Key Research and Development and Achievement Transformation Program Project (2023YFSH0043); Supported by Key Research and Development Project of Lanzhou Jiaotong University (ZDYF2304)

More Information

Corresponding author: E-mail：shiyuepz@163.com

摘要

摘要:
针对无人机（UAV）视角图像中目标尺度变化大、小尺寸目标占比高且背景噪声干扰严重等问题，提出一种基于特征信息增强与补充的无人机视角图像目标检测算法。为利用高层语义信息捕捉更加丰富的多尺度信息，提出一种多元融合空间金字塔池化（MFSPPF）方法；设计多分支语义增强（MBSE）模块，可以通过多个分支提取丰富的多尺度特征并构建多尺度特征之间的联系，从而在特征融合传递信息时防止重要特征信息丢失；提出细节特征补充（DFC）模块，将低层特征信息提取细化后得到丰富细粒度特征信息，经过特征融合传递实现对高层特征中细节信息的补充。通过在VisDrone2021数据集上进行实验，结果表明：所提算法相较于基线算法YOLOv8m，平均精度（AP）、AP50、AP75、AP(s)、AP(m)、AP(l)分别提高3.7%、5.4%、3.9%、3.1%、4.0%、7.9%。并且所提方法在YOLOv8其他模型中同样适用。与其他算法相比，所提算法在不同交并比（IOU）阈值标准下和不同尺寸目标检测下都具有优异的检测效果，同时保持较快的检测速度，能够适用于无人机视角图像检测任务。
- 目标检测 /
- 无人机图像 /
- YOLOv8 /
- 特征增强 /
- 细粒度特征
Abstract:
To address the challenges of significant object scale variation, a high proportion of small-sized objects, and severe background noise interference in images captured from unmanned aerial vehicle (UAV) viewpoint images, a novel object detection algorithm based on feature information enhancement and complementation is proposed. First, to leverage high-level semantic information for capturing richer multi-scale features, a multivariate fusion spatial pyramid pooling-fast (MFSPPF) method is introduced. Second, in order to avoid losing important information during feature fusion, a multi-branch semantic enhancement (MBSE) module is created to extract rich multi-scale features over several branches and create links between these features. In addition, the detailed feature complementation (DFC) module is proposed to extract and refine the low-level feature information to obtain rich, fine-grained feature information and achieve the complementation of the detailed information in the high-level features after feature fusion. Experiments conducted on the VisDrone2021 dataset demonstrate that the proposed algorithm outperforms the baseline YOLOv8m method, with average precision (AP), AP50, AP75, AP(s), AP(m), and AP(l) improving by 3.7%, 5.4%, 3.9%, 3.1%, 4.0%, and 7.9%, respectively. Furthermore, the proposed method is applicable to other YOLOv8 models. The proposed algorithm is appropriate for UAV perspective image detection jobs because it maintains a fast detection speed while achieving better detection performance across a range of intersection over union (IOU) thresholds and object sizes.
- object detection /
- UAV images /
- YOLOv8 /
- feature enhancement /
- fine-grained feature

HTML全文

图 1 本文算法架构

Figure 1. Architecture of the proposed algorithm

下载: 全尺寸图片幻灯片

图 2 多元融合快速空间金字塔池化模块结构

Figure 2. Multivariate fusion spatial pyramid pooling-fast module structure

下载: 全尺寸图片幻灯片

图 3 多分支语义增强模块结构

Figure 3. Multi-branch semantic enhancement module structure

下载: 全尺寸图片幻灯片

图 4 细节特征补充模块结构

Figure 4. Detailed features complement module structure

下载: 全尺寸图片幻灯片

图 5 基线算法与本文算法在VisDrone验证集上的混淆矩阵结果

Figure 5. Confusion matrix results for baseline algorithm and the proposed algorithm on VisDrone validation set

下载: 全尺寸图片幻灯片

图 6 本文算法可视化结果

Figure 6. Results of visualization of the proposed algorithm

下载: 全尺寸图片幻灯片

表 1 VisDrone验证集上的消融实验结果

Table 1. Results of ablation experiments on VisDrone validation set

算法				AP/%	AP50/%	AP75/%	AP(s)/%	AP(m)/%	AP(l)/%	参数量
基线	MFSPPF	DFC	MBSE	AP/%	AP50/%	AP75/%	AP(s)/%	AP(m)/%	AP(l)/%	参数量
√				33.7	53.5	35.7	25.5	46.0	42.5	25.85×10⁶
√	√			37.1	58.5	39.0	28.3	49.2	47.3	26.34×10⁶
√		√		36.7	58.1	38.7	28.0	49.0	48.0	26.81×10⁶
√			√	37.0	58.4	39.1	27.9	49.6	52.3	32.71×10⁶
√	√	√	√	37.4	58.9	39.6	28.6	50.0	50.4	35.47×10⁶

下载: 导出CSV

表 2 YOLOv8其他算法改进后检测性能对比

Table 2. Comparison of detection performance of improved YOLOv8 other algorithms

算法	AP/%	AP50/%	AP75/%	AP(s)/%	AP(m)/%	AP(l)/%	参数量
Yolov8n^[5]	28.6	46.8	29.5	19.7	40.1	45.1	3.01×10⁶
Yolov8n+MFSPPF+DFC+MBSE	31.1	50.1	32.4	22.7	42.6	46.9	4.40×10⁶
Yolov8s^[5]	34.0	54.5	35.3	24.9	46.0	48.1	11.14×10⁶
Yolov8s+MFSPPF+DFC+MBSE	35.2	56.2	36.4	26.3	47.4	50.4	16.63×10⁶

下载: 导出CSV

表 3 VisDrone测试集上的对比试验结果

Table 3. Comparative test results on the VisDrone test set

算法	骨干网络	AP/%	AP50/%	AP75/%	AP(s)/%	AP(m)/%	AP(l)/%	推理时间/ms
RetinaNet^[6]	ResNet-50	8.0	15.5	7.6	2.1	13.2	23.7	22.7
Faster RCNN^[12]	ResNet-50	12.8	23.9	12.6	5.2	21.1	29.7	33.4
Yolov5m^[30]	CSPDarkNet	21.3	37.7	21.7	13.2	31.2	33.4	21.1
YoloXm^[10]	CSPDarkNet	19.7	36.5	19.1	13.0	28.0	23.9	28.1
TOOD^[31]	ResNet-50	22.9	38.2	23.8	13.9	34.3	43.1	43.8
VFNet^[32]	ResNet-50	17.6	30.3	17.9	10.0	26.7	36.4	43.4
YOLOv8m^[5]	CSPDarkNet	25.8	42.4	27.1	16.8	37.4	39.3	18.4
Gold YOLOm^[33]	CSPDarkNet	25.9	43.9	26.5	15.6	37.1	48.7	28.2
本文算法	CSPDarkNet	28.7	47.4	29.8	18.4	41.0	49.0	28.6

下载: 导出CSV

参考文献(33)

[1]	江波, 屈若锟, 李彦冬, 等. 基于深度学习的无人机航拍目标检测研究综述[J]. 航空学报, 2021, 42(4): 524519. JIANG B, QU R K, LI Y D, et al. Object detection in UAV imagery based on deep learning: review[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4): 524519(in Chinese).
[2]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the Computer Vision-ECCV. Berlin: Springer, 2014: 740-755.
[3]	EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338.
[4]	ZHU P, WEN L, DU D, et al. Detection and tracking meet drones challenge[J]. IEEE Trans Pattern Anal Mach Intell, 2022, 44(11): 7380-7399.
[5]	JOCHER G. YOLOv8 by ultralytics[EB/OL]. (2023-09-27)[2024-02-10]. https://github.com/ultralytics/ultralytics.
[6]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2999-3007.
[7]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multiBox detector[C]//Proceedings of the Computer Vision-ECCV. Berlin: Springer, 2016: 21-37.
[8]	TIAN Z, SHEN C H, CHEN H, et al. FCOS: a simple and strong anchor-free object detector[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(4): 1922-1933.
[9]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 779-788.
[10]	GE Z, LIU S T, WANG F, et al. Yolox: exceeding yolo series in 2021[EB/OL]. (2021-08-06)[2024-03-29]. https://arxiv.org/abs/2107.08430.
[11]	WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2023: 7464-7475.
[12]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[13]	HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2980-2988.
[14]	CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6154-6162.
[15]	WU Y, CHEN Y P, YUAN L, et al. Rethinking classification and localization for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 10183-10192.
[16]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the Computer Vision-ECCV . Berlin: Springer, 2020: 213-229.
[17]	ZHU X Z, SU W J, LU L W, et al. Deformable detr: deformable transformers for end-to-end object detection1[EB/OL]. (2021-03-18)[2024-03-29]. https://arxiv.org/abs/2010.04159.
[18]	LI F, ZHANG H, LIU S L, et al. DN-DETR: accelerate DETR training by introducing query DeNoising[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 13609-13617.
[19]	ZHAO Y, LV W Y, XU S L, et al. Detrs beat yolos on real-time object detection[EB/OL]. (2023-04-17)[2024-03-29]. https://arxiv.org/abs/2304.08069.
[20]	KIEFER B, OTT D, ZELL A. Leveraging synthetic data in object detection on unmanned aerial vehicles[C]//Proceedings of the 26th International Conference on Pattern Recognition. Piscataway: IEEE Press, 2022: 3564-3571.
[21]	CHEN Y T, LI J, NIU Y F, et al. Small object detection networks based on classification-oriented super-resolution GAN for UAV aerial imagery[C]//Proceedings of the Chinese Control and Decision Conference. Piscataway: IEEE Press, 2019: 4610-4615.
[22]	王殿伟, 胡里晨, 房杰, 等. 基于改进Double-Head RCNN的无人机航拍图像小目标检测算法[J]. 北京航空航天大学学报, 2024, 50(7): 2141-2149. WANG D W, HU L C, FANG J, et al. Small target detection algorithm based on improved Double-Head RCNN for UAV aerial images[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50(7): 2141-2149(in Chinese).
[23]	冒国韬, 邓天民, 于楠晶. 基于多尺度分割注意力的无人机航拍图像目标检测算法[J]. 航空学报, 2023, 44(5): 268-278. MAO G T, DENG T M, YU N J. Object detection in UAV images based on multi-scale split attention[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(5): 268-278(in Chinese).
[24]	LI C L, YANG T, ZHU S J, et al. Density map guided object detection in aerial images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE Press, 2020737-746.
[25]	LENG J X, MO M, ZHOU Y H, et al. Pareto refocusing for drone-view object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(3): 1320-1334.
[26]	DENG S T, LI S, XIE K, et al. A global-local self-adaptive network for drone-view object detection[J]. IEEE Transactions on Image Processing, 2020, 30: 1556-1569.
[27]	CHEN N Y, LI Y, YANG Z M, et al. LODNU: lightweight object detection network in UAV vision[J]. The Journal of Supercomputing, 2023, 79(9): 10117-10138.
[28]	YU G H, CHANG Q Y, LV W Y, et al. PP-PicoDet: a better real-time object detector on mobile devices[EB/OL]. (2021-11-01)[2024-03-30]. https://arxiv.org/abs/2111.00902.
[29]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141.
[30]	JOCHER G. YOLOv5 by ultralytics (version 7.0) [EB/OL]. (2022-11-22)[2024-02-10]. https://doi.org/10.5281/zenodo.3908559.
[31]	FENG C J, ZHONG Y J, GAO Y, et al. TOOD: task-aligned one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 3490-3499.
[32]	ZHANG H Y, WANG Y, DAYOUB F, et al. VarifocalNet: an IoU-aware dense object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 8510-8519.
[33]	WANG C C, HE W, NIE Y, et al. Gold-YOLO: efficient object detector via gather-and-distribute mechanism[EB/OL]. (2023-09-20)[2024-03-30]. https://arxiv.org/abs/2309.11331.