基于改进Double-Head RCNN的无人机航拍图像小目标检测算法

王殿伟; 胡里晨; 房杰; 许志杰

doi:10.13700/j.bh.1001-5965.2022.0591

基于改进Double-Head RCNN的无人机航拍图像小目标检测算法

doi: 10.13700/j.bh.1001-5965.2022.0591

1.
西安邮电大学通信与信息工程学院，西安 710121
2.
哈德斯菲尔德大学计算机与工程学院，哈德斯菲尔德 HD13DH

基金项目: 国家自然科学基金(62201454)；西安邮电大学研究生创新基金(CXJJLY2021058)

详细信息

通讯作者:
E-mail：wangdianwei@xupt.edu.cn

中图分类号: V279；TP391.41
计量
- 文章访问数: 669
- HTML全文浏览量: 162
- PDF下载量: 10
- 被引次数: 0
出版历程
- 收稿日期: 2022-07-05
- 录用日期: 2022-11-01
- 网络出版日期: 2023-01-10
- 整期出版日期: 2024-07-18

Small target detection algorithm based on improved Double-Head RCNN for UAV aerial images

1.
School of Communications and Information Engineering，Xi’an University of Posts & Telecommunications，Xi’an 710121，China
2.
School of Computing and Engineering，University of Huddersfield，Huddersfield HD13DH，UK

Funds: National Natural Science Foundation of China (62201454); Postgraduate Innovation Foundation of Xi’an University of Posts and Telecommunications (CXJJLY2021058)

More Information

Corresponding author: E-mail：wangdianwei@xupt.edu.cn

摘要

摘要:
为解决无人机航拍图像中小目标特征信息少且容易被噪声干扰导致现有算法漏检率和误检率高的问题，提出一种改进Double-Head Region-卷积神经网络（RCNN）的无人机航拍图像小目标检测算法。在骨干网络ResNet-50上引入Transformer和可变形卷积（DCN）模块，更有效提取小目标特征信息和语义信息；提出一种基于内容感知特征重组（CARAFE）的特征金字塔网络（FPN）结构模块，解决特征融合过程中小目标被背景噪声干扰而丢失特征信息的问题；在区域建议网络中针对小目标尺度分布特点重新设置Anchor生成尺度，进一步提升小目标检测性能。在VisDrone-DET2021数据集上的实验结果表明：所提算法能提取更具有表征能力的小目标特征信息和语义信息，对比Double-Head RCNN算法，所提算法的参数量增加了9.73×10⁶，FPS损失了0.6，但是AP、AP50和AP75分别提升了2.6%、6.2%和2.1%，APs提升了3.1%。
- 小目标检测 /
- 无人机航拍图像 /
- Double-Head RCNN /
- Transformer /
- 内容感知特征重组
Abstract:
The feature information of small targets in unmanned aerial vehicle aerial images is small and easily interfered with by noise, which leads to the high missed detection and false detection rates of existing algorithms. To address these issues, a small target detection algorithm based on an improved Double-Head region-convolutional neural networks（RCNN）for unmanned aerial vehicle aerial images was proposed. Transformer and deformable convolution networks (DCN) modules were introduced on the backbone network ResNet-50 to extract small target feature information and semantic information more effectively. A feature pyramid network（FPN） structure based on content-aware reassembly of features (CARAFE) was proposed to solve the problem that the small target information is interfered with by the background noise, and the feature information is lost in the process of feature fusion. The generation scale of Anchor was reset according to the characteristics of small target scale distribution in the region proposal network to further improve the small target detection performance. The experimental results on the VisDrone-DET2021 dataset show that the proposed algorithm can extract feature and semantic information of small targets with representational capacity more effectively. Compared with the Double-Head RCNN algorithm, the parameter quantity of the proposed algorithm increases by 9.73×10⁶, and the FPS loss is 0.6. However, AP, AP50, and AP75 increase by 2.6%, 6.2%, and 2.1% respectively, and APs increases by 3.1%.
- small target detection /
- unmanned aerial vehicle aerial images /
- Double-Head RCNN /
- Transformer /
- content-aware reassembly of features

HTML全文

图 1 本文算法框架

Figure 1. Framework of the proposed algorithm

下载: 全尺寸图片幻灯片

图 2 残差网络结构

Figure 2. Structure of residual network

下载: 全尺寸图片幻灯片

图 3 CARAFE^[20]结构

Figure 3. Structure of CARAFE^[20]

下载: 全尺寸图片幻灯片

图 4 VisDrone-DET2021^[23]数据集目标尺寸分布

Figure 4. Target size distribution in VisDrone-DET2021^[23] dataset

下载: 全尺寸图片幻灯片

图 5 损失函数曲线

Figure 5. Loss function curve

下载: 全尺寸图片幻灯片

图 6 不同算法检测结果

Figure 6. Detection results of different algorithms

下载: 全尺寸图片幻灯片

图 7 特征图对比

Figure 7. Comparison of feature maps

下载: 全尺寸图片幻灯片

表 1 本文算法与先进算法比较

Table 1. Comparison between the proposed algorithm and advanced algorithms

算法	骨干网络	输入图像分辨率/像素	轮数	AP/%	AP50/%	AP75/%	APs/%	APm/%	AP1/%	FPS
RetinaNet+PVT v2^[4]	PVTv2-B1	1333×800	20	20.6	34.1	21.4	10.4	34.5	48.9	10.9
Deform DETR^[6]	ResNet-50	1333×800	50	18.0	32.1	17.5	9.7	27.8	44.9	9.2
RetinaNet^[7]	ResNet-50	1333×800	20	18.5	30.1	19.3	8.2	31.7	48.0	16.6
YOLOX-S^[8]	CSPDarkNet	640×640	300	19.9	34.6	19.6	10.8	30.9	42.6	53.1
Cascade R-CNN^[12]	ResNet-50	1333×800	20	24.5	39.3	25.9	15.4	36.9	45.2	9.0
Grid R-CNN^[13]	ResNet-50	1333×800	20	25.1	39.3	26.9	15.8	37.8	47.7	10.4
FPN^[15]	ResNet-50	1333×800	20	22.9	36.8	23.9	13.6	34.7	53.8	14.5
VFNet^[24]	ResNet-50	1333×800	20	22.5	37.4	23.5	13.1	34.5	45.4	13.7
Double-Head RCNN^[18]	ResNet-50	1333×800	20	23.8	38.3	24.8	15.0	35.1	44.8	6.5
本文算法	R50-Attention	1333×800	20	26.4	44.5	26.9	18.1	36.1	48.7	5.9

下载: 导出CSV

表 2 消融实验结果

Table 2. Ablation experimental results

Double-Head RCNN^[18]			AP/%	AP50/%	AP75/%	APs/%	APm/%	AP1/%	参数量	FPS
R50-Attention	CARAFE-FPN	Anchor生成策略	AP/%	AP50/%	AP75/%	APs/%	APm/%	AP1/%	参数量	FPS
			23.8	38.3	24.8	15.0	35.1	44.8	46.76×10⁶	6.5
		√	24.9	41.6	25.5	17.2	34.3	44.4	46.76×10⁶	6.9
√			25.0	40.9	26.2	15.5	37.2	48.4	50.88×10⁶	5.9
	√		24.6	39.6	26.0	15.6	36.5	47.6	52.36×10⁶	6.4
	√	√	25.6	42.7	26.2	17.6	35.4	46.0	52.36×10⁶	6.6
√	√	√	26.4	44.5	26.9	18.1	36.1	48.7	56.49×10⁶	5.9
注：√表示添加一个方法。

下载: 导出CSV

参考文献(24)

[1]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2014: 740-755.
[2]	WANG J W, YANG W, GUO H W, et al. Tiny object detection in aerial images[C]//Proceedings of the International Conference on Pattern Recognition. Piscataway: IEEE Press, 2021: 3791-3798.
[3]	陈映雪, 丁文锐, 李红光, 等. 基于视频帧间运动估计的无人机图像车辆检测[J]. 北京航空航天大学学报, 2020, 46(3): 634-642. CHEN Y X, DING W R, LI H G, et al. Vehicle detection in UAV image based on video interframe motion estimation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(3): 634-642(in Chinese).
[4]	WANG W H, XIE E Z, LI X, et al. PVT v2: Improved baselines with pyramid vision transformer[J]. Computational Visual Media, 2022, 8(3): 415-424. doi: 10.1007/s41095-022-0274-8
[5]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 213-229.
[6]	ZHU X Z, SU W J, LU L W, et al. Deformable DETR: Deformable transformers for end-to-end object detection[EB/OL]. (2021-03-18)[2022-05-18]. http://arxiv.org/abs/2010.04159.
[7]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2999-3007.
[8]	GE Z, LIU S T, WANG F, et al. YOLOX: Exceeding YOLO series in 2021[EB/OL]. (2021-08-06)[2022-05-20]. http://arxiv.org/abs/2107.08430.
[9]	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[EB/OL]. (2022-07-01)[2022-07-03]. http://arxiv.org/abs/2207.02696.
[10]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031
[11]	PANG J M, CHEN K, SHI J P, et al. Libra R-CNN: Towards balanced learning for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 821-830.
[12]	CAI Z W, VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6154-6162.
[13]	LU X, LI B Y, YUE Y X, et al. Grid R-CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7355-7364.
[14]	GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1440-1448.
[15]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 936-944.
[16]	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8759-8768.
[17]	TAN M X, PANG R M, LE Q V. EfficientDet: Scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 10778-10787.
[18]	WU Y, CHEN Y P, YUAN L, et al. Rethinking classification and localization for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 10183-10192.
[19]	DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 764-773.
[20]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[21]	WANG J Q, CHEN K, XU R, et al. CARAFE: Content-aware ReAssembly of FEatures[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 3007-3016.
[22]	ZHU X Z, CHENG D Z, ZHANG Z, et al. An empirical study of spatial attention mechanisms in deep networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 6687-6696.
[23]	CAO Y R, HE Z J, WANG L J, et al. VisDrone-DET2021: The vision meets drone object detection challenge results[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE Press, 2021: 2847-2854.
[24]	ZHANG H Y, WANG Y, DAVOUB F, et al. VarifocalNet: An iou-aware dense object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 8510-8519.