Small target detection algorithm based on improved Double-Head RCNN for UAV aerial images
-
摘要:
为解决无人机航拍图像中小目标特征信息少且容易被噪声干扰导致现有算法漏检率和误检率高的问题,提出一种改进Double-Head Region-卷积神经网络(RCNN)的无人机航拍图像小目标检测算法。在骨干网络ResNet-50上引入Transformer和可变形卷积(DCN)模块,更有效提取小目标特征信息和语义信息;提出一种基于内容感知特征重组(CARAFE)的特征金字塔网络(FPN)结构模块,解决特征融合过程中小目标被背景噪声干扰而丢失特征信息的问题;在区域建议网络中针对小目标尺度分布特点重新设置Anchor生成尺度,进一步提升小目标检测性能。在VisDrone-DET2021数据集上的实验结果表明:所提算法能提取更具有表征能力的小目标特征信息和语义信息,对比Double-Head RCNN算法,所提算法的参数量增加了9.73×106,FPS损失了0.6,但是AP、AP50和AP75分别提升了2.6%、6.2%和2.1%,APs提升了3.1%。
-
关键词:
- 小目标检测 /
- 无人机航拍图像 /
- Double-Head RCNN /
- Transformer /
- 内容感知特征重组
Abstract:The feature information of small targets in unmanned aerial vehicle aerial images is small and easily interfered with by noise, which leads to the high missed detection and false detection rates of existing algorithms. To address these issues, a small target detection algorithm based on an improved Double-Head region-convolutional neural networks(RCNN)for unmanned aerial vehicle aerial images was proposed. Transformer and deformable convolution networks (DCN) modules were introduced on the backbone network ResNet-50 to extract small target feature information and semantic information more effectively. A feature pyramid network(FPN) structure based on content-aware reassembly of features (CARAFE) was proposed to solve the problem that the small target information is interfered with by the background noise, and the feature information is lost in the process of feature fusion. The generation scale of Anchor was reset according to the characteristics of small target scale distribution in the region proposal network to further improve the small target detection performance. The experimental results on the VisDrone-DET2021 dataset show that the proposed algorithm can extract feature and semantic information of small targets with representational capacity more effectively. Compared with the Double-Head RCNN algorithm, the parameter quantity of the proposed algorithm increases by 9.73×106, and the FPS loss is 0.6. However, AP, AP50, and AP75 increase by 2.6%, 6.2%, and 2.1% respectively, and APs increases by 3.1%.
-
表 1 本文算法与先进算法比较
Table 1. Comparison between the proposed algorithm and advanced algorithms
算法 骨干网络 输入图像分辨率/像素 轮数 AP/% AP50/% AP75/% APs/% APm/% AP1/% FPS RetinaNet+PVT v2[4] PVTv2-B1 1333×800 20 20.6 34.1 21.4 10.4 34.5 48.9 10.9 Deform DETR[6] ResNet-50 1333×800 50 18.0 32.1 17.5 9.7 27.8 44.9 9.2 RetinaNet[7] ResNet-50 1333×800 20 18.5 30.1 19.3 8.2 31.7 48.0 16.6 YOLOX-S[8] CSPDarkNet 640×640 300 19.9 34.6 19.6 10.8 30.9 42.6 53.1 Cascade R-CNN[12] ResNet-50 1333×800 20 24.5 39.3 25.9 15.4 36.9 45.2 9.0 Grid R-CNN[13] ResNet-50 1333×800 20 25.1 39.3 26.9 15.8 37.8 47.7 10.4 FPN[15] ResNet-50 1333×800 20 22.9 36.8 23.9 13.6 34.7 53.8 14.5 VFNet[24] ResNet-50 1333×800 20 22.5 37.4 23.5 13.1 34.5 45.4 13.7 Double-Head RCNN[18] ResNet-50 1333×800 20 23.8 38.3 24.8 15.0 35.1 44.8 6.5 本文算法 R50-Attention 1333×800 20 26.4 44.5 26.9 18.1 36.1 48.7 5.9 表 2 消融实验结果
Table 2. Ablation experimental results
Double-Head RCNN[18] AP/% AP50/% AP75/% APs/% APm/% AP1/% 参数量 FPS R50-Attention CARAFE-FPN Anchor生成策略 23.8 38.3 24.8 15.0 35.1 44.8 46.76×106 6.5 √ 24.9 41.6 25.5 17.2 34.3 44.4 46.76×106 6.9 √ 25.0 40.9 26.2 15.5 37.2 48.4 50.88×106 5.9 √ 24.6 39.6 26.0 15.6 36.5 47.6 52.36×106 6.4 √ √ 25.6 42.7 26.2 17.6 35.4 46.0 52.36×106 6.6 √ √ √ 26.4 44.5 26.9 18.1 36.1 48.7 56.49×106 5.9 注:√表示添加一个方法。 -
[1] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2014: 740-755. [2] WANG J W, YANG W, GUO H W, et al. Tiny object detection in aerial images[C]//Proceedings of the International Conference on Pattern Recognition. Piscataway: IEEE Press, 2021: 3791-3798. [3] 陈映雪, 丁文锐, 李红光, 等. 基于视频帧间运动估计的无人机图像车辆检测[J]. 北京航空航天大学学报, 2020, 46(3): 634-642.CHEN Y X, DING W R, LI H G, et al. Vehicle detection in UAV image based on video interframe motion estimation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(3): 634-642(in Chinese). [4] WANG W H, XIE E Z, LI X, et al. PVT v2: Improved baselines with pyramid vision transformer[J]. Computational Visual Media, 2022, 8(3): 415-424. doi: 10.1007/s41095-022-0274-8 [5] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 213-229. [6] ZHU X Z, SU W J, LU L W, et al. Deformable DETR: Deformable transformers for end-to-end object detection[EB/OL]. (2021-03-18)[2022-05-18]. [7] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2999-3007. [8] GE Z, LIU S T, WANG F, et al. YOLOX: Exceeding YOLO series in 2021[EB/OL]. (2021-08-06)[2022-05-20]. [9] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[EB/OL]. (2022-07-01)[2022-07-03]. [10] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031 [11] PANG J M, CHEN K, SHI J P, et al. Libra R-CNN: Towards balanced learning for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 821-830. [12] CAI Z W, VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6154-6162. [13] LU X, LI B Y, YUE Y X, et al. Grid R-CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7355-7364. [14] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1440-1448. [15] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 936-944. [16] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8759-8768. [17] TAN M X, PANG R M, LE Q V. EfficientDet: Scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 10778-10787. [18] WU Y, CHEN Y P, YUAN L, et al. Rethinking classification and localization for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 10183-10192. [19] DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 764-773. [20] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010. [21] WANG J Q, CHEN K, XU R, et al. CARAFE: Content-aware ReAssembly of FEatures[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 3007-3016. [22] ZHU X Z, CHENG D Z, ZHANG Z, et al. An empirical study of spatial attention mechanisms in deep networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 6687-6696. [23] CAO Y R, HE Z J, WANG L J, et al. VisDrone-DET2021: The vision meets drone object detection challenge results[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE Press, 2021: 2847-2854. [24] ZHANG H Y, WANG Y, DAVOUB F, et al. VarifocalNet: An iou-aware dense object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 8510-8519.