-
摘要:
针对目标检测中小目标物体漏检率及误检率高等问题,提出了一种基于Yolov3-Tiny算法的改进模型。改进k-means聚类方法,增加3×3和1×1的卷积池化层,将第9层卷积输出上采样,并与第8层卷积得到的特征图进行连接,得到新的输出:52×52卷积层,形成新的特征金字塔。基于卡尔曼滤波算法实现目标跟踪,提出融合跟踪算法的检测网络,使用匈牙利匹配算法对检测边缘框与跟踪边缘框进行最优匹配,利用跟踪结果修正检测结果,提高了检测速度,同时提升了检测能力。在ROS、Gazebo和自动驾驶仪软件PX4的综合仿真环境下对所提算法进行了对比试验。试验结果表明:改进算法平均检测速度降低了15.6%,mAP提高了6.5%。融合跟踪算法后的网络平均检测速度提高了34.2%,mAP提高了8.6%。融合跟踪算法后的网络能够满足系统实时性和准确性的要求。
-
关键词:
- 目标检测 /
- Yolov3-Tiny /
- 目标跟踪 /
- 卡尔曼滤波 /
- 匈牙利匹配
Abstract:An improved model based on the Yolov3-Tiny algorithm is proposed for object detection with high miss and false detection rates of small target objects. The k-means clustering method is improved by adding 3×3 and 1×1 convolutional pooling layers, upsampling the output of the 9th convolutional layer, and connecting it with the feature map obtained from the 8th convolutional layer to obtain a new output: 52×52 convolutional layers, forming a new feature pyramid. The object tracking is implemented based on Kalman filtering algorithm. And the detection network with fusion tracking algorithm is proposed. The Hungarian algorithm is used to optimally match the detection edge frame with the tracking edge frame, and the tracking result is used to correct the detection result. The detection speed is improved and the detection capability is enhanced at the same time. The proposed algorithm is tested in a comprehensive simulation environment of ROS, Gazebo and autopilot software PX4 for comparison. The test results show that the improved algorithm reduces the average detection speed by 15.6% and increases the mAP by 6.5%. The fusion tracking algorithm improves the average detection speed of the network by 34.2% and the mAP by 8.6%. The network after the implementation of fusion tracking algorithm can meet the requirements of system real-time property and accuracy.
-
Key words:
- object detection /
- Yolov3-Tiny /
- object tracking /
- Kalman filter /
- Hungary match
-
表 1 Yolov3-Tiny主体网络及感受野
Table 1. Yolov3-Tiny subject network and receptive field
卷积层层数 卷积核大小 步长 输入尺寸 输出尺寸 感受野大小 1 3 1 416 416 3 2 3 1 208 208 8 3 3 1 104 104 18 4 3 1 52 52 38 5 3 1 26 26 78 6 3 1 13 13 158 7 3 1 13 13 254 8 3 1 13 13 318 表 2 实验环境
Table 2. Experimental environment
参数 配置 CPU Intel i7-8700 GPU NVIDIA GTX 1070 系统 Ubuntu16.04 加速环境 CUDA 9.0 cuDNN7.0 训练框架 Darknet 表 3 实验对比结果
Table 3. Experimental comparison results
算法 AP50/% mAP50/% 平均检测速度/(帧·s-1) 人 汽车 消防栓 路标 Yolov3-Tiny 72.23 75.48 69.49 66.67 71.23 45 改进Yolov3-Tiny 77.12 80.45 72.18 70.23 75.83 38 融合了跟踪算法的Yolov3-Tiny 82.33 87.14 77.47 75.28 82.34 51 -
[1] LOWE D G. Object recognition from local scale-invariant features[C]//Proceedings of IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 1999, 2: 1150-1157. [2] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. doi: 10.1023/B:VISI.0000029664.99615.94 [3] REICHARDT W, POGGIO T. Visual control of orientation behaviour in the fly. Part I. A quantitative analysis[J]. Quarterly Reviews of Biophysics, 1976, 9(3): 311-375. doi: 10.1017/S0033583500002523 [4] 卢章平, 孔德飞, 李小蕾, 等. 背景差分与三帧差分结合的运动目标检测算法[J]. 计算机测量与控制, 2013, 21(12): 3315-3318. doi: 10.3969/j.issn.1671-4598.2013.12.050LU Z P, KONG D F, LI X L, et al. A method for moving object detection based on background subtraction and three-frame differencing[J]. Computer Measurement & Control, 2013, 21(12): 3315-3318(in Chienese). doi: 10.3969/j.issn.1671-4598.2013.12.050 [5] SANG H F, XU C. Moving object detection based on background subtraction of block updates[C]//2013 6th International Conference on Intelligent Networks and Intelligent Systems (ICINIS). Piscataway: IEEE Press, 2013: 51-54. [6] KARASULU B, KORUKOGLU S. Moving object detection and tracking by using annealed background subtraction method in videos: Performance optimization[J]. Expert Systems with Applications, 2012, 39(1): 33-43. doi: 10.1016/j.eswa.2011.06.040 [7] 杨阳, 唐慧明. 基于视频的行人车辆检测与分类[J]. 计算机工程, 2014, 40(11): 135-138. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC201411028.htmYANG Y, TANG H M. Pedestrian-vehicle detection and classification based on video[J]. Computer Engineering, 2014, 40(11): 135-138(in Chineses). https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC201411028.htm [8] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 580-587. [9] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149. [10] DAI J. R-FCN: Object detection via region-based fully convolutional networks[C]//Proceedings of IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 1-9. [11] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 21-37. [12] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 779-788. [13] REDMON J, FARHADI A. YOLO9000: Better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 7263-7271. [14] REDMON J, FARHADI A. YOLOv3: An incremental improvement[EB/OL]. (2018-04-08)[2020-11-01]. https://arxiv.org/abs/1804.02767. [15] JO K U, IM J H, KIM J, et al. A real-time multi-class multi-object tracker using YOLOv2[C]//2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA). Piscataway: IEEE Press, 2017: 507-511. [16] 王聪. 基于深度学习的无人机单目标识别与跟踪算法研究[D]. 泉州: 华侨大学, 2019.WANG C. Research on single target recognition and tracking algorithm for UAV based on deep learning[D]. Quanzhou: Huaqiao University, 2019(in Chinese). [17] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 2117-2125. [18] 鞠默然, 罗海波, 王仲博, 等. 改进的YOLO V3算法及其在小目标检测中的应用[J]. 光学学报, 2019, 39(7): 0715004. https://www.cnki.com.cn/Article/CJFDTOTAL-GXXB201907028.htmJU M R, LUO H B, WANG Z B, et al. Improved YOLO V3 algorithm and its application in small target detection[J]. Acta Optica Sinica, 2019, 39(7): 0715004(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-GXXB201907028.htm [19] KONG T, SUN F, LIU H, et al. FoveaBox: Beyound anchor-based object detection[J]. IEEE Transactions on Image Processing, 2020, 29: 7389-7398. doi: 10.1109/TIP.2020.3002345 [20] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). Piscataway: IEEE Press, 2005, 1: 886-893. [21] KALMAN R E. A new approach to linear filtering and prediction problems[J]. Transactions of the ASME-Journal of Basic Engineering, 1960, 82: 35-45. doi: 10.1115/1.3662552 [22] ARULAMPALAM M S, MASKELL S, GORDON N, et al. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking[J]. IEEE Transactions on Signal Processing, 2002, 50(2): 174-188. doi: 10.1109/78.978374 [23] XIAO K, TAN S, WANG G, et al. XTDrone: A customizable multi-rotor UAVs simulation platform[EB/OL]. (2020-03-21)[2020-11-01]. https://arxiv.org/abs/2003.09700v1.