-
摘要:
基于人工智能(AI)芯片搭建轻量化深度神经网络,可以在无人机(UAV)机载端实现视频中车辆目标的自动检测,具有重要的应用前景。为此,提出了一种针对无人机图像车辆目标的检测方法,并在AI芯片上进行部署与测试。方法具体包括:结合无人机图像中车辆目标的尺寸范围,对MobileNet-SSD网络进行裁剪,构建轻量化单帧图像检测器;为解决小目标特性在轻量网络框架下引发的检测性能下降问题,引入帧间运动矢量估计,根据相邻帧信息辅助预测当前帧丢失目标的位置范围,并利用检测结果进行修正,实现丢失目标的再召回。通过对多个数据集进行融合与自动补充标注,搭建了一个高质量的无人机图像车辆目标数据集;同时将方法在基于RK3399芯片计算的嵌入式开发平台上进行实验验证,结果表明:搭建的网络能够显著减少存储资源占用,具有轻量化的特点;同时相比于单帧检测法,引入视频帧间运动估计方法可以有效提高检测精度,并在AI芯片上实现125.3 ms/帧的检测速度。
-
关键词:
- 无人机(UAV) /
- 目标检测 /
- 轻量化神经网络 /
- 人工智能(AI)芯片 /
- 运动估计
Abstract:The lightweight neural network embedded on artificial intelligence (AI) chips can realize the onboard automatic detection of vehicle objects in unmanned aerial vehicle (UAV) videos, which is important in practical applications. In this paper, a vehicle object detection algorithm in UAV videos is proposed, and then deployed and tested on AI chips. For the proposed detection algorithm, firstly, the MobileNet-SSD network is clipped based on the range of vehicle objects' size in UAV images to construct a lightweight single-frame object detector. Secondly, the interframe motion estimation was introduced to improve the poor detection performance which is usually caused by small object characteristics and lightweight network. Thirdly, the position range of missing objects in the current frame is predicted according to the information of adjacent frames. Finally, the predicted position is corrected by detection results, and the recall of lost objects is realized. Additionally, a high-quality UAV image vehicle dataset was built by fusion and automatic supplementary annotation of multiple datasets. The proposed algorithm is verified on the embedded development platform based on RK3399 chip. The results show that the network with the proposed algorithm can significantly reduce the occupation of storage resources with the lightweight characteristics. Compared to the traditional single-image detection algorithm, the proposed algorithm can effectively improve the detection accuracy. Moreover, detection speed can be as low as 125.3 ms per frame on the AI chip.
-
表 1 无人机图像车辆数据集
Table 1. UAV-generated image vehicle dataset
参数 训练集 测试集 序列数 50 10 图片数 35298 6812 正俯视视角序列数 12 斜俯视视角序列数 48 图像最小尺寸/(像素×像素) 960×540 图像最大尺寸/(像素×像素) 2688×1512 目标尺寸范围/(像素×像素) 10×10~300×300 表 2 不同视频检测法性能对比
Table 2. Performance comparison of different video detection methods
视频检测法 P/% R/% F1/% 检测速度/(ms·帧-1) MobileNetV1_SSD 79.29 34.32 47.90 124.3 MobileNetV1_SSD_cut 79.17 35.42 48.94 119.2 Optical flow 74.32 41.18 52.99 125.0 本文方法 76.80 43.65 55.66 125.3 表 3 基于不同轻量化网络的AP指标提升
Table 3. AP promotion based on different lightweight networks
视频检测法 AP/% MobileNetV1_SSD 48.38 MobileNetV2_SSD 31.82 本文方法(V1) 49.23 本文方法(V2) 33.05 表 4 不同阈值设定的实验结果对比
Table 4. Comparison of experimental results with different threshold setting
匹配阈值 召回阈值 P/% R/% F1/% 0.7 0.1 77.64 42.92 55.28 0.6 0.1 77.33 43.43 55.62 0.5 0.1 76.80 43.65 55.66 0.5 0 70.14 43.70 53.84 0.5 0.2 79.03 40.89 53.89 0.5 0.3 78.66 33.43 46.91 -
[1] 张可, 杨灿坤, 周春平, 等.无人机视频图像运动目标检测算法综述[J].液晶与显示, 2019, 34(1):98-109.ZHANG K, YANG C K, ZHOU C P, et al.Review of moving target detection algorithms for UAV video images[J]. Chinese Journal of Liquid Crystals and Displays, 2019, 34(1):98-109(in Chinese). [2] GIRSHICK R, DONAHUE J, DARRELL T, et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2014: 580-587. [3] GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2015: 1440-1448. [4] REN S, HE K, GIRSHICK R, et al.Faster R-CNN: Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.Red Hook, NY: Curran Associates, 2015: 91-99. [5] DAI J, LI Y, HE K, et al.R-FCN: Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems.Red Hook, NY: Curran Associates, 2016: 379-387. [6] REDMON J, DIVVALA S, GIRSHICK R, et al.You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2016: 779-788. [7] LIU W, ANGUELOV D, ERHAN D, et al.SSD: Single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision.Berlin: Springer, 2016: 21-37. [8] IANDOLA F N, HAN S, MOSKEWICZ M W, et al.SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size[EB/OL]. (2016-11-04)[2019-06-01] [9] ZHANG X, ZHOU X, LIN M, et al.Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2018: 6848-6856. [10] HOWARD A G, ZHU M, CHEN B, et al.MobileNet: Efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17)[2019-06-01] [11] 顾帅.Android移动平台下基于深度学习的目标检测技术研究[D].西安: 西安电子科技大学, 2018.GU S.Deep learning based object detection technology research under android mobile platform[D]. Xi'an: Xidian University, 2018(in Chinese). [12] 吴广伟.基于移动终端的轻量级卷积神经网络研究与实现[D].西安: 西安电子科技大学, 2018.WU G W.Research and implementation of lightweight convolution neural network based on mobile terminal[D]. Xi'an: Xidian University, 2018(in Chinese). [13] ZHU X, WANG Y, DAI J, et al.Flow-guided feature aggregation for video object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2017: 408-417. [14] WANG S, ZHOU Y, YAN J, et al.Fully motion-aware network for video object detection[C]//Proceedings of the European Conference on Computer Vision.Berlin: Springer, 2018: 542-557. [15] HAN W, KHORRAMI P, PAINE T L, et al.Seq-NMS for video object detection[EB/OL]. (2016-08-22)[2019-06-01]. [16] KANG K, LI H, YAN J, et al.T-CNN:Tubelets with convolutional neural networks for object detection from videos[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(10):2896-2907. doi: 10.1109/TCSVT.2017.2736553 [17] SHUAI H, LIU Q, ZHANG K, et al.Cascaded regional spatio-temporal feature-routing networks for video object detection[J]. IEEE Access, 2018, 6:3096-3106. doi: 10.1109/ACCESS.2017.2787155 [18] 余启明.基于背景减法和帧差法的运动目标检测算法研究[D].赣州: 江西理工大学, 2013.YU Q M.Moving object detection research based on background subtraction and frame difference[D]. Ganzhou: Jiangxi University of Science and Technology, 2013(in Chinese). [19] GUNNAR F.Two-frame motion estimation based on polynomial expansion[C]//Scandinavian Conference on Image Analysis.Berlin: Brev Publishing, 2003: 363-370. doi: 10.1007/3-540-45103-X_50 [20] DU D, QI Y, YU H, et al.The unmanned aerial vehicle benchmark: Object detection and tracking[C]//Proceedings of the European Conference on Computer Vision.Berlin: Springer, 2018: 370-386. [21] ZHU P, WEN L, DU D, et al.VisDrone-VDT2018: The vision meets drone video detection and tracking challenge results[C]//Proceedings of the European Conference on Computer Vision.Berlin: Springer, 2018: 496-518. [22] ZHU P, WEN L, DU D, et al.Visdrone-DET2018: The vision meets drone object detection in image challenge results[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 437-468. -