Object detection algorithm for UAV viewpoint images based on feature information complementation and enhancement
-
摘要:
针对无人机(UAV)视角图像中目标尺度变化大、小尺寸目标占比高且背景噪声干扰严重等问题,提出一种基于特征信息增强与补充的无人机视角图像目标检测算法。为利用高层语义信息捕捉更加丰富的多尺度信息,提出一种多元融合空间金字塔池化(MFSPPF)方法;设计多分支语义增强(MBSE)模块,可以通过多个分支提取丰富的多尺度特征并构建多尺度特征之间的联系,从而在特征融合传递信息时防止重要特征信息丢失;提出细节特征补充(DFC)模块,将低层特征信息提取细化后得到丰富细粒度特征信息,经过特征融合传递实现对高层特征中细节信息的补充。通过在VisDrone2021数据集上进行实验,结果表明:所提算法相较于基线算法YOLOv8m,平均精度(AP)、AP50、AP75、AP(s)、AP(m)、AP(l)分别提高3.7%、5.4%、3.9%、3.1%、4.0%、7.9%。并且所提方法在YOLOv8其他模型中同样适用。与其他算法相比,所提算法在不同交并比(IOU)阈值标准下和不同尺寸目标检测下都具有优异的检测效果,同时保持较快的检测速度,能够适用于无人机视角图像检测任务。
Abstract:To address the challenges of significant object scale variation, a high proportion of small-sized objects, and severe background noise interference in images captured from unmanned aerial vehicle (UAV) viewpoint images, a novel object detection algorithm based on feature information enhancement and complementation is proposed. First, to leverage high-level semantic information for capturing richer multi-scale features, a multivariate fusion spatial pyramid pooling-fast (MFSPPF) method is introduced. Second, in order to avoid losing important information during feature fusion, a multi-branch semantic enhancement (MBSE) module is created to extract rich multi-scale features over several branches and create links between these features. In addition, the detailed feature complementation (DFC) module is proposed to extract and refine the low-level feature information to obtain rich, fine-grained feature information and achieve the complementation of the detailed information in the high-level features after feature fusion. Experiments conducted on the VisDrone2021 dataset demonstrate that the proposed algorithm outperforms the baseline YOLOv8m method, with average precision (AP), AP50, AP75, AP(s), AP(m), and AP(l) improving by 3.7%, 5.4%, 3.9%, 3.1%, 4.0%, and 7.9%, respectively. Furthermore, the proposed method is applicable to other YOLOv8 models. The proposed algorithm is appropriate for UAV perspective image detection jobs because it maintains a fast detection speed while achieving better detection performance across a range of intersection over union (IOU) thresholds and object sizes.
-
Key words:
- object detection /
- UAV images /
- YOLOv8 /
- feature enhancement /
- fine-grained feature
-
表 1 VisDrone验证集上的消融实验结果
Table 1. Results of ablation experiments on VisDrone validation set
算法 AP/% AP50/% AP75/% AP(s)/% AP(m)/% AP(l)/% 参数量 基线 MFSPPF DFC MBSE √ 33.7 53.5 35.7 25.5 46.0 42.5 25.85×106 √ √ 37.1 58.5 39.0 28.3 49.2 47.3 26.34×106 √ √ 36.7 58.1 38.7 28.0 49.0 48.0 26.81×106 √ √ 37.0 58.4 39.1 27.9 49.6 52.3 32.71×106 √ √ √ √ 37.4 58.9 39.6 28.6 50.0 50.4 35.47×106 表 2 YOLOv8其他算法改进后检测性能对比
Table 2. Comparison of detection performance of improved YOLOv8 other algorithms
表 3 VisDrone测试集上的对比试验结果
Table 3. Comparative test results on the VisDrone test set
算法 骨干网络 AP/% AP50/% AP75/% AP(s)/% AP(m)/% AP(l)/% 推理时间/ms RetinaNet[6] ResNet-50 8.0 15.5 7.6 2.1 13.2 23.7 22.7 Faster RCNN[12] ResNet-50 12.8 23.9 12.6 5.2 21.1 29.7 33.4 Yolov5m[30] CSPDarkNet 21.3 37.7 21.7 13.2 31.2 33.4 21.1 YoloXm[10] CSPDarkNet 19.7 36.5 19.1 13.0 28.0 23.9 28.1 TOOD[31] ResNet-50 22.9 38.2 23.8 13.9 34.3 43.1 43.8 VFNet[32] ResNet-50 17.6 30.3 17.9 10.0 26.7 36.4 43.4 YOLOv8m[5] CSPDarkNet 25.8 42.4 27.1 16.8 37.4 39.3 18.4 Gold YOLOm[33] CSPDarkNet 25.9 43.9 26.5 15.6 37.1 48.7 28.2 本文算法 CSPDarkNet 28.7 47.4 29.8 18.4 41.0 49.0 28.6 -
[1] 江波, 屈若锟, 李彦冬, 等. 基于深度学习的无人机航拍目标检测研究综述[J]. 航空学报, 2021, 42(4): 524519.JIANG B, QU R K, LI Y D, et al. Object detection in UAV imagery based on deep learning: review[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4): 524519(in Chinese). [2] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the Computer Vision-ECCV. Berlin: Springer, 2014: 740-755. [3] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338. [4] ZHU P, WEN L, DU D, et al. Detection and tracking meet drones challenge[J]. IEEE Trans Pattern Anal Mach Intell, 2022, 44(11): 7380-7399. [5] JOCHER G. YOLOv8 by ultralytics[EB/OL]. (2023-09-27)[2024-02-10]. https://github.com/ultralytics/ultralytics. [6] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2999-3007. [7] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multiBox detector[C]//Proceedings of the Computer Vision-ECCV. Berlin: Springer, 2016: 21-37. [8] TIAN Z, SHEN C H, CHEN H, et al. FCOS: a simple and strong anchor-free object detector[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(4): 1922-1933. [9] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 779-788. [10] GE Z, LIU S T, WANG F, et al. Yolox: exceeding yolo series in 2021[EB/OL]. (2021-08-06)[2024-03-29]. https://arxiv.org/abs/2107.08430. [11] WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2023: 7464-7475. [12] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [13] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2980-2988. [14] CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6154-6162. [15] WU Y, CHEN Y P, YUAN L, et al. Rethinking classification and localization for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 10183-10192. [16] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the Computer Vision-ECCV . Berlin: Springer, 2020: 213-229. [17] ZHU X Z, SU W J, LU L W, et al. Deformable detr: deformable transformers for end-to-end object detection1[EB/OL]. (2021-03-18)[2024-03-29]. https://arxiv.org/abs/2010.04159. [18] LI F, ZHANG H, LIU S L, et al. DN-DETR: accelerate DETR training by introducing query DeNoising[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 13609-13617. [19] ZHAO Y, LV W Y, XU S L, et al. Detrs beat yolos on real-time object detection[EB/OL]. (2023-04-17)[2024-03-29]. https://arxiv.org/abs/2304.08069. [20] KIEFER B, OTT D, ZELL A. Leveraging synthetic data in object detection on unmanned aerial vehicles[C]//Proceedings of the 26th International Conference on Pattern Recognition. Piscataway: IEEE Press, 2022: 3564-3571. [21] CHEN Y T, LI J, NIU Y F, et al. Small object detection networks based on classification-oriented super-resolution GAN for UAV aerial imagery[C]//Proceedings of the Chinese Control and Decision Conference. Piscataway: IEEE Press, 2019: 4610-4615. [22] 王殿伟, 胡里晨, 房杰, 等. 基于改进Double-Head RCNN的无人机航拍图像小目标检测算法[J]. 北京航空航天大学学报, 2024, 50(7): 2141-2149.WANG D W, HU L C, FANG J, et al. Small target detection algorithm based on improved Double-Head RCNN for UAV aerial images[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50(7): 2141-2149(in Chinese). [23] 冒国韬, 邓天民, 于楠晶. 基于多尺度分割注意力的无人机航拍图像目标检测算法[J]. 航空学报, 2023, 44(5): 268-278.MAO G T, DENG T M, YU N J. Object detection in UAV images based on multi-scale split attention[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(5): 268-278(in Chinese). [24] LI C L, YANG T, ZHU S J, et al. Density map guided object detection in aerial images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE Press, 2020737-746. [25] LENG J X, MO M, ZHOU Y H, et al. Pareto refocusing for drone-view object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(3): 1320-1334. [26] DENG S T, LI S, XIE K, et al. A global-local self-adaptive network for drone-view object detection[J]. IEEE Transactions on Image Processing, 2020, 30: 1556-1569. [27] CHEN N Y, LI Y, YANG Z M, et al. LODNU: lightweight object detection network in UAV vision[J]. The Journal of Supercomputing, 2023, 79(9): 10117-10138. [28] YU G H, CHANG Q Y, LV W Y, et al. PP-PicoDet: a better real-time object detector on mobile devices[EB/OL]. (2021-11-01)[2024-03-30]. https://arxiv.org/abs/2111.00902. [29] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141. [30] JOCHER G. YOLOv5 by ultralytics (version 7.0) [EB/OL]. (2022-11-22)[2024-02-10]. https://doi.org/10.5281/zenodo.3908559. [31] FENG C J, ZHONG Y J, GAO Y, et al. TOOD: task-aligned one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 3490-3499. [32] ZHANG H Y, WANG Y, DAYOUB F, et al. VarifocalNet: an IoU-aware dense object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 8510-8519. [33] WANG C C, HE W, NIE Y, et al. Gold-YOLO: efficient object detector via gather-and-distribute mechanism[EB/OL]. (2023-09-20)[2024-03-30]. https://arxiv.org/abs/2309.11331. -


下载: