一种轻量化的多目标实时检测模型

邱博; 刘翔; 石蕴玉; 尚岩峰

doi:10.13700/j.bh.1001-5965.2020.0066

一种轻量化的多目标实时检测模型

doi: 10.13700/j.bh.1001-5965.2020.0066

邱博¹,
刘翔^1, ,,
石蕴玉¹,
尚岩峰²

1.
上海工程技术大学电子电气工程学院, 上海 201620
2.
公安部第三研究所物联网技术研发中心, 上海 200031

基金项目:

国家重点研发计划 2016YFC0801304

上海市“科技创新行动计划”高新技术领域项目 17511106803

详细信息

作者简介:
邱博   男, 硕士研究生。主要研究方向:计算机视觉、深度学习、目标检测

刘翔   男, 博士, 副教授, 硕士生导师。主要研究方向:计算机视觉及人工生命

石蕴玉   女, 博士, 讲师, 硕士生导师。主要研究方向:视频大数据智能分析

尚岩峰   男, 博士, 副研究员。主要研究方向:模式识别和视频大数据

通讯作者:
刘翔, E-mail:xliu@sues.edu.cn

中图分类号: TP391.4
计量
- 文章访问数: 565
- HTML全文浏览量: 117
- PDF下载量: 83
- 被引次数: 0
出版历程
- 收稿日期: 2020-03-02
- 录用日期: 2020-04-18
- 网络出版日期: 2020-09-20

A lightweight multi-target real-time detection model

1.
School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
2.
Internet of Things Technology R & D Center, The Third Research Institute of the Ministry of Public Security, Shanghai 200031, China

Funds:

National Key R & D Program of China 2016YFC0801304

Shanghai Science and Technology Innovation Action Plan in Hi-tech Field 17511106803

More Information

Corresponding author: LIU Xiang, E-mail:xliu@sues.edu.cn

摘要

摘要:
为实现公安监控系统内容分析的精准智能及提高服务实战能力，提出一种轻量化的多目标实时检测算法。首先，基于CenterNet检测网络增加了CBNet的多融合阶梯级联结构，有效地解决了主干网络在日常监控中特征提取能力不足的问题；其次，通过模型剪枝压缩网络减少参数量，加快了监控视频分析速度。本文利用部分COCO数据集和自行采集的现场数据进行训练与测试，并与其他主流检测算法（YOLO、Faster-RCNN、SSD等）进行消融实验。实验结果表明：所提模型在公共安全监控中能有效地做到速度与精度的均衡，并具有较强的普适性。
- 目标检测 /
- 深度学习 /
- 模型压缩 /
- 模型蒸馏 /
- 级联融合
Abstract:
For the public security monitoring system, a lightweight multi-target real-time detection algorithm is proposed in order to realize the accurate intelligence of the content analysis and improve the actual service ability. First, the multi-fusion gradient cascade structure of CBNet is added based on CenterNet detection network, which effectively solves the problem of insufficient feature extraction capability of the backbone network in daily monitoring videos. Second, the number of parameters is reduced through the model pruning and compression, which can speed up the analysis speed of monitoring videos. During the experiments, the dataset for training and testing consists of a part of COCO datasets and a number of field data collected by ourselves. The ablation experiments are conducted with other mainstream detection algorithms (YOLO, Faster-RCNN, SSD, etc.). The experimental results show that the presented model can effectively balance the speed and precision in the analysis of monitoring videos for public security and has stronger universality.
- target detection /
- deep learning /
- model compression /
- model distillation /
- cascade fusion

HTML全文

图 1 CenterNet模型结构

Figure 1. CenterNet model structure

下载: 全尺寸图片幻灯片

图 2 峰值响应可视化

Figure 2. Peak response visualization

下载: 全尺寸图片幻灯片

图 3 主干网络简图

Figure 3. Backbone network illustration

下载: 全尺寸图片幻灯片

图 4 深度可分离卷积原理

Figure 4. Depthwise separable convolution principle

下载: 全尺寸图片幻灯片

图 5 稀疏性与精度关系

Figure 5. Relationship between sparsity and precision

下载: 全尺寸图片幻灯片

图 6 数据集部分图片

Figure 6. Partial pictures of dataset

下载: 全尺寸图片幻灯片

图 7 协同过滤对比

Figure 7. Collaborative filtering comparison

下载: 全尺寸图片幻灯片

表 1 主流模型精度对比

Table 1. Mainstream models' precision comparison

模型	召回率	准确率	内存/KB
ResNet-18	0.83	0.80	25 014
ResNet-18+hourglass+CBNet	0.92	0.90	24 254
ResNet-18×2+CBNet	0.86	0.83	22 254
Hourglass×2+CBNet	0.85	0.83	23 280
SSD	0.87	0.84	23 020
slim YOLOV3	0.90	0.88	33 894

下载: 导出CSV

表 2 模型压缩前后精度变化

Table 2. Model's accuracy before and after compression

模型	召回率	准确率	内存/KB
ResNet-18+hourglass+CBNet	0.92	0.90	24 254
ResNet-18+hourglass+CBNet(剪枝后)	0.90	0.88	3 676

下载: 导出CSV

表 3 模型推理速度对比

Table 3. Model's inference speed comparison

模型	帧数/s
CenterNet(ResNet-18)	6.6
ResNet-18+hourglass+CBNet	9.9
ResNet-18(深度可分离卷积)+hourglass+CBNet	11.2
ResNet-18+hourglass+CBNet(剪枝后)	19.2

下载: 导出CSV

参考文献(19)

[1]	KALIA R, LEE K D, SAMIR B V R, et al.An analysis of the effect of different image preprocessing techniques on the performance of SURF: Speeded up robust features[C]//Workshop on Frontiers of Computer Vision.Piscataway: IEEE Press, 2011: 1-6.
[2]	LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision, 2004, 60(2):91-110. doi: 10.1023/B:VISI.0000029664.99615.94
[3]	MUNRO S, THOMAS K L, ABU-SHAAR M.Molecular characterization of a peripheral receptor for cannabinoids[J].Nature, 1993, 365(6441):61-65. doi: 10.1038/365061a0
[4]	PLATT J C.A fast algorithm for training support vector machines[J].Journal of Information Technology, 1998, 2(5):1-28. http://www.researchgate.net/publication/242613062_A_fast_algorithm_for_training_support_vector_machines
[5]	FREUND Y, SCHAPIRE R E.A decision-theoretic generalization of on-line learning and an application to boosting[C]//Proceedings of the 2nd European Conference on Computational Learning Theory.Berlin: Springer, 1995: 22-37. https://www.researchgate.net/publication/225540813_Lecture_Notes_in_Computer_Science
[6]	GIRSHICK R, DONAHUE J, DARRELL T, et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2014: 580-587.
[7]	GIRSHICK R.Fast-RCNN[C]//Proceedings of 2015 IEEE In-ternational Conference on Computer Vision.Piscataway: IEEE Press, 2015: 10-15.
[8]	REN S, HE K, GIRSHICK R, et al.Faster R-CNN: Towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.Cambridge: MIT Press, 2015: 1-15.
[9]	REDMON J, FARHADI A.YOLO9000: Better, faster, stronger[EB/OL].(2016-12-25)[2020-02-27].https://arxiv.org/abs/1612.08242.
[10]	LIU W, ANGUELOV D, ERHAN D, et al.SSD: Single shot multibox detector[C]//Proceedings of 2016 European Conference on Computer Vision and Pattern Recognition.Berlin: Springer, 2016: 13-17. https://www.researchgate.net/publication/286513835_SSD_Single_Shot_MultiBox_Detector
[11]	LAW H, DENG J.CornerNet:Detecting objects as paired keypoints[J].International Journal of Computer Vision, 2018, 128:642-656. doi: 10.1007/s11263-019-01204-1
[12]	KONG T, SUN F, LIU H, et al.FoveaBox: Beyond anchor-based object detector[EB/OL].(2019-04-08)[2020-02-27].https://arxiv.org/abs/1904.03797.
[13]	ZHOU X, WANG D, KRÄHENBVHL P.Objects as points[EB/OL].(2019-04-16)[2020-02-27].https://arxiv.org/abs/1904.07850.
[14]	HE K M, ZHANG X Y, REN S Q.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2016: 770-778.
[15]	NEWELL A, YANG K, JIA D.Stacked hourglass networks for human pose estimation[EB/OL].(2016-03-22)[2020-02-27].https://arxiv.org/abs/1603.06937.
[16]	LIU Y, WANG Y, WANG S, et al.CBNet: A novel composite backbone network architecture for object detection[EB/OL].(2019-09-09)[2020-02-27].https://arxiv.org/abs/1909.03625.
[17]	CHOLLET F.Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2017: 1800-1807.
[18]	TAN M, PANG R, LE Q V.EfficientDet: Scalable and efficient object detection[EB/OL].(2019-11-20)[2020-02-27].https://arxiv.org/abs/1911.09070.
[19]	HE Y, ZHANG X Y, SUN J.Channel pruning for accelerating very deep neural networks[EB/OL].(2017-08-21)[2020-02-27].https://arxiv.org/abs/1707.06168.