基于视频帧间运动估计的无人机图像车辆检测

陈映雪; 丁文锐; 李红光; 王蒙; 王旭

doi:10.13700/j.bh.1001-5965.2019.0279

基于视频帧间运动估计的无人机图像车辆检测

doi: 10.13700/j.bh.1001-5965.2019.0279

1.
北京航空航天大学电子信息工程学院, 北京 100083
2.
北京航空航天大学无人系统研究院, 北京 100083
3.
合一智芯科技有限公司, 北京 100083

基金项目:

国防基础科研计划 JCKY2017601C006

武汉大学测绘遥感信息工程国家重点实验室开放基金 17E01

详细信息

作者简介:
陈映雪  女, 硕士研究生。主要研究方向:遥感图像目标检测算法及应用

丁文锐  女, 博士, 研究员, 博士生导师。主要研究方向:多源图像信息处理、视觉目标检测与跟踪

李红光  男, 博士, 高级工程师, 硕士生导师。主要研究方向:无人系统光学图像智能处理及边缘计算应用

通讯作者:
李红光, lihongguang@buaa.edu.cn

中图分类号: TP183;TP301.6
计量
- 文章访问数: 760
- HTML全文浏览量: 108
- PDF下载量: 356
- 被引次数: 0
出版历程
- 收稿日期: 2019-06-03
- 录用日期: 2019-09-20
- 网络出版日期: 2020-03-20

Vehicle detection in UAV image based on video interframe motion estimation

1.
School of Electronic and Information Engineering, Beihang University, Beijing 100083, China
2.
Institute of Unmanned System, Beihang University, Beijing 100083, China
3.
Heyintelligence Technology Limited Company, Beijing 100083, China

Funds:

National Defense Basic Scientific Research Program of China JCKY2017601C006

Open Fund of State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University 17E01

More Information

Corresponding author: LI Hongguang, lihongguang@buaa.edu.cn

摘要

摘要:
基于人工智能(AI)芯片搭建轻量化深度神经网络，可以在无人机(UAV)机载端实现视频中车辆目标的自动检测，具有重要的应用前景。为此，提出了一种针对无人机图像车辆目标的检测方法，并在AI芯片上进行部署与测试。方法具体包括：结合无人机图像中车辆目标的尺寸范围，对MobileNet-SSD网络进行裁剪，构建轻量化单帧图像检测器；为解决小目标特性在轻量网络框架下引发的检测性能下降问题，引入帧间运动矢量估计，根据相邻帧信息辅助预测当前帧丢失目标的位置范围，并利用检测结果进行修正，实现丢失目标的再召回。通过对多个数据集进行融合与自动补充标注，搭建了一个高质量的无人机图像车辆目标数据集；同时将方法在基于RK3399芯片计算的嵌入式开发平台上进行实验验证，结果表明：搭建的网络能够显著减少存储资源占用，具有轻量化的特点；同时相比于单帧检测法，引入视频帧间运动估计方法可以有效提高检测精度，并在AI芯片上实现125.3 ms/帧的检测速度。
- 无人机(UAV) /
- 目标检测 /
- 轻量化神经网络 /
- 人工智能(AI)芯片 /
- 运动估计
Abstract:
The lightweight neural network embedded on artificial intelligence (AI) chips can realize the onboard automatic detection of vehicle objects in unmanned aerial vehicle (UAV) videos, which is important in practical applications. In this paper, a vehicle object detection algorithm in UAV videos is proposed, and then deployed and tested on AI chips. For the proposed detection algorithm, firstly, the MobileNet-SSD network is clipped based on the range of vehicle objects' size in UAV images to construct a lightweight single-frame object detector. Secondly, the interframe motion estimation was introduced to improve the poor detection performance which is usually caused by small object characteristics and lightweight network. Thirdly, the position range of missing objects in the current frame is predicted according to the information of adjacent frames. Finally, the predicted position is corrected by detection results, and the recall of lost objects is realized. Additionally, a high-quality UAV image vehicle dataset was built by fusion and automatic supplementary annotation of multiple datasets. The proposed algorithm is verified on the embedded development platform based on RK3399 chip. The results show that the network with the proposed algorithm can significantly reduce the occupation of storage resources with the lightweight characteristics. Compared to the traditional single-image detection algorithm, the proposed algorithm can effectively improve the detection accuracy. Moreover, detection speed can be as low as 125.3 ms per frame on the AI chip.
- unmanned aerial vehicle (UAV) /
- object detection /
- lightweight neural network /
- artificial intelligence (AI) chip /
- motion estimation

HTML全文

图 1 基于视频帧间运动估计目标检测框架

Figure 1. Object detection framework based on video interframe motion estimation

下载: 全尺寸图片幻灯片

图 2 基于候选结果的检测校正

Figure 2. Correction by detection based on candidate results

下载: 全尺寸图片幻灯片

图 3 数据集处理前后对比

Figure 3. Contrast before and after dataset processing

下载: 全尺寸图片幻灯片

图 4 系统搭建框架

Figure 4. System structure

下载: 全尺寸图片幻灯片

图 5 不同网络模型存储大小对比

Figure 5. Comparison of memory size among different network models

下载: 全尺寸图片幻灯片

表 1 无人机图像车辆数据集

Table 1. UAV-generated image vehicle dataset

参数	训练集	测试集
序列数	50	10
图片数	35298	6812
正俯视视角序列数	12
斜俯视视角序列数	48
图像最小尺寸/(像素×像素)	960×540
图像最大尺寸/(像素×像素)	2688×1512
目标尺寸范围/(像素×像素)	10×10~300×300

下载: 导出CSV

表 2 不同视频检测法性能对比

Table 2. Performance comparison of different video detection methods

视频检测法	P/%	R/%	F1/%	检测速度/(ms·帧^-1)
MobileNetV1_SSD	79.29	34.32	47.90	124.3
MobileNetV1_SSD_cut	79.17	35.42	48.94	119.2
Optical flow	74.32	41.18	52.99	125.0
本文方法	76.80	43.65	55.66	125.3

下载: 导出CSV

表 3 基于不同轻量化网络的AP指标提升

Table 3. AP promotion based on different lightweight networks

视频检测法	AP/%
MobileNetV1_SSD	48.38
MobileNetV2_SSD	31.82
本文方法(V1)	49.23
本文方法(V2)	33.05

下载: 导出CSV

表 4 不同阈值设定的实验结果对比

Table 4. Comparison of experimental results with different threshold setting

匹配阈值	召回阈值	P/%	R/%	F1/%
0.7	0.1	77.64	42.92	55.28
0.6	0.1	77.33	43.43	55.62
0.5	0.1	76.80	43.65	55.66
0.5	0	70.14	43.70	53.84
0.5	0.2	79.03	40.89	53.89
0.5	0.3	78.66	33.43	46.91

下载: 导出CSV

参考文献(22)

[1]	张可, 杨灿坤, 周春平, 等.无人机视频图像运动目标检测算法综述[J].液晶与显示, 2019, 34(1):98-109. ZHANG K, YANG C K, ZHOU C P, et al.Review of moving target detection algorithms for UAV video images[J]. Chinese Journal of Liquid Crystals and Displays, 2019, 34(1):98-109(in Chinese).
[2]	GIRSHICK R, DONAHUE J, DARRELL T, et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2014: 580-587.
[3]	GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2015: 1440-1448.
[4]	REN S, HE K, GIRSHICK R, et al.Faster R-CNN: Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.Red Hook, NY: Curran Associates, 2015: 91-99.
[5]	DAI J, LI Y, HE K, et al.R-FCN: Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems.Red Hook, NY: Curran Associates, 2016: 379-387.
[6]	REDMON J, DIVVALA S, GIRSHICK R, et al.You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2016: 779-788.
[7]	LIU W, ANGUELOV D, ERHAN D, et al.SSD: Single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision.Berlin: Springer, 2016: 21-37.
[8]	IANDOLA F N, HAN S, MOSKEWICZ M W, et al.SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size[EB/OL]. (2016-11-04)[2019-06-01]
[9]	ZHANG X, ZHOU X, LIN M, et al.Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2018: 6848-6856.
[10]	HOWARD A G, ZHU M, CHEN B, et al.MobileNet: Efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17)[2019-06-01]
[11]	顾帅.Android移动平台下基于深度学习的目标检测技术研究[D].西安: 西安电子科技大学, 2018. GU S.Deep learning based object detection technology research under android mobile platform[D]. Xi'an: Xidian University, 2018(in Chinese).
[12]	吴广伟.基于移动终端的轻量级卷积神经网络研究与实现[D].西安: 西安电子科技大学, 2018. WU G W.Research and implementation of lightweight convolution neural network based on mobile terminal[D]. Xi'an: Xidian University, 2018(in Chinese).
[13]	ZHU X, WANG Y, DAI J, et al.Flow-guided feature aggregation for video object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2017: 408-417.
[14]	WANG S, ZHOU Y, YAN J, et al.Fully motion-aware network for video object detection[C]//Proceedings of the European Conference on Computer Vision.Berlin: Springer, 2018: 542-557.
[15]	HAN W, KHORRAMI P, PAINE T L, et al.Seq-NMS for video object detection[EB/OL]. (2016-08-22)[2019-06-01].
[16]	KANG K, LI H, YAN J, et al.T-CNN:Tubelets with convolutional neural networks for object detection from videos[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(10):2896-2907. doi: 10.1109/TCSVT.2017.2736553
[17]	SHUAI H, LIU Q, ZHANG K, et al.Cascaded regional spatio-temporal feature-routing networks for video object detection[J]. IEEE Access, 2018, 6:3096-3106. doi: 10.1109/ACCESS.2017.2787155
[18]	余启明.基于背景减法和帧差法的运动目标检测算法研究[D].赣州: 江西理工大学, 2013. YU Q M.Moving object detection research based on background subtraction and frame difference[D]. Ganzhou: Jiangxi University of Science and Technology, 2013(in Chinese).
[19]	GUNNAR F.Two-frame motion estimation based on polynomial expansion[C]//Scandinavian Conference on Image Analysis.Berlin: Brev Publishing, 2003: 363-370. doi: 10.1007/3-540-45103-X_50
[20]	DU D, QI Y, YU H, et al.The unmanned aerial vehicle benchmark: Object detection and tracking[C]//Proceedings of the European Conference on Computer Vision.Berlin: Springer, 2018: 370-386.
[21]	ZHU P, WEN L, DU D, et al.VisDrone-VDT2018: The vision meets drone video detection and tracking challenge results[C]//Proceedings of the European Conference on Computer Vision.Berlin: Springer, 2018: 496-518.
[22]	ZHU P, WEN L, DU D, et al.Visdrone-DET2018: The vision meets drone object detection in image challenge results[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 437-468.