一种历史信息特征敏感的行人迭代检测方法

戴佩哲; 刘翔; 张星; 尚岩峰; 赵静文; 王诗雨

doi:10.13700/j.bh.1001-5965.2021.0665

一种历史信息特征敏感的行人迭代检测方法

doi: 10.13700/j.bh.1001-5965.2021.0665

1.
上海工程技术大学电子电气工程学院，上海 201620
2.
上海工程技术大学管理学院，上海 201620
3.
公安部第三研究所物联网技术研发中心，上海 200031

基金项目: 国家重点研发计划(2017YFC0821603)；上海市自然科学基金(19ZR1421500)

详细信息

通讯作者:
E-mail：xliu@sues.edu.cn

中图分类号: TP391.4
计量
- 文章访问数: 1900
- HTML全文浏览量: 46
- PDF下载量: 10
- 被引次数: 0
出版历程
- 收稿日期: 2021-11-05
- 录用日期: 2022-01-27
- 网络出版日期: 2022-02-15
- 整期出版日期: 2023-10-01

An iterative pedestrian detection method sensitive to historical information features

1.
School of Electronic and Electric Engineering，Shanghai University of Engineering Science，Shanghai 201620，China
2.
School of Management，Shanghai University of Engineering Science，Shanghai 201620，China
3.
Internet of Things R&D Technology Center，The Third Research Institute of the Ministry of Public Security，Shanghai 200031，China

Funds: National Key R&D Program of China (2017YFC0821603); Natural Science Foundation of Shanghai (19ZR1421500)

More Information

Corresponding author: E-mail：xliu@sues.edu.cn

摘要

摘要:
基于深度学习的目标检测算法通常需要使用非极大值抑制等后处理方法对预测框进行筛选，无法在行人拥挤的场景下平衡模型的检测精度和召回率。虽然迭代检测的方法可以解决非极大值抑制等方法带来的问题，但是重复检测同样会限制模型的性能。提出了一种历史信息特征敏感的行人迭代检测方法。引入带权重的历史信息特征（WHIC），提高特征的区分度；利用历史信息特征提取模块（HIFEM）得到不同尺度的历史信息特征，并融合进主网络中进行多尺度检测，增强了模型对历史信息特征的敏感度，有效抑制重复检测框的产生。实验结果表明：所提方法在拥挤场景的行人检测数据集CrowdHuman和WiderPerson上取得了最优的检测精度和召回率。
- 机器视觉 /
- 目标检测 /
- 特征融合 /
- 卷积神经网络 /
- 深度学习
Abstract:
Object detection algorithms based on deep learning usually need to use post-processing methods such as non-maximum suppression to filter the prediction box, and can not balance the detection accuracy and recall rate of the model in the crowded pedestrian scene. Although the iterative detection method can solve the problems caused by non-maximum suppression methods, repeated detection will also limit the performance of the model. In this paper, a pedestrian iterative detection method sensitive to historical information features is proposed. Firstly, the weighted historical information characteristics (WHIC) is introduced to improve the feature discrimination. Second, the historical information feature extraction module (HIFEM) suggested in this paper is utilized to obtain and fuse historical information features of various scales into the main network for multi-scale detection, increasing the sensitivity of the model to the historical information features. This method can effectively suppress the generation of repeated detection frames. Experimental results show that the proposed method achieves the best detection accuracy and recall on CrowdHuman and WiderPerson.
- machine vision /
- object detection /
- feature fusion /
- convolution neural network /
- deep learning

HTML全文

图 1 IterDet迭代网络结构

Figure 1. Structure of IterDet iterative network

下载: 全尺寸图片幻灯片

图 2 IterDet 2次检测得到的检测框展示

Figure 2. Display of detection box for two tests of IterDet

下载: 全尺寸图片幻灯片

图 3 HIFEM网络结构

Figure 3. Network structure of HIFEM

下载: 全尺寸图片幻灯片

图 4 检测效果

Figure 4. Detection effect

下载: 全尺寸图片幻灯片

表 1 HIFEM网络参数

Table 1. Parameters of HIFEM

层名称	输出尺寸	层参数
conv1	$112 \times 112$	$7 \times 7$ , 64, stride 2
conv2_x	$56 \times 56$	$3 \times 3$ max pool, stride 2
conv2_x	$56 \times 56$	$\left( {\begin{array}{*{20}{c} } {3 \times 3,}&{256} \\ {3 \times 3,}&{256} \end{array} } \right) \times 2$
conv3_x	$28 \times 28$	$\left( {\begin{array}{*{20}{c} } {3 \times 3,}&{512} \\ {3 \times 3,}&{512} \end{array} } \right) \times 2$
conv4_x	$14 \times 14$	$\left( {\begin{array}{*{20}{c} } {3 \times 3,}&{1\;024} \\ {3 \times 3,}&{1\;024} \end{array} } \right) \times 2$

下载: 导出CSV

表 2 CrowdHuman数据集上基于RetinaNet+IterDet的权重系数实验结果

Table 2. Experimental results of weight coefficient based on RetinaNet+IterDet on CrowdHuman dataset

轻度遮挡权重系数	中度遮挡权重系数	重度遮挡权重系数	召回率	检测精度	平均重复检测框个数
			91.49	84.77	11.89
2	4	6	92.39	85.45	8.87
4	8	12	93.26	85.59	8.56
6	12	18	92.13	85.16	8.59
8	16	24	88.19	85.11	7.18
10	20	30	80.76	81.13	6.12

下载: 导出CSV

表 3 CrowdHuman数据集上基于Faster R-CNN+IterDet的权重系数实验结果

Table 3. Experimental results of weight coefficient based on Faster R-CNN+IterDet on CrowdHuman dataset

轻度遮挡权重系数	中度遮挡权重系数	重度遮挡权重系数	召回率	检测精度	平均重复检测框个数
			95.80	88.08	8.45
2	4	6	96.14	89.18	7.34
4	8	12	96.54	89.56	6.12
6	12	18	96.34	88.65	5.06
8	16	24	95.16	85.17	5.12
10	20	30	90.46	83.14	4.06

下载: 导出CSV

表 4 WiderPerson数据集上基于RetinaNet+IterDet的权重系数实验结果

Table 4. Experimental results of weight coefficient based on RetinaNet+IterDet on WiderPerson dataset

轻度遮挡权重系数	中度遮挡权重系数	重度遮挡权重系数	召回率	检测精度	平均重复检测框个数
			95.35	90.23	8.66
2	4	6	95.44	91.59	7.15
4	8	12	96.13	92.43	6.21
6	12	18	95.89	91.87	6.34
8	16	24	90.14	88.56	5.22
10	20	30	88.19	85.34	5.12

下载: 导出CSV

表 5 WiderPerson数据集上基于Faster R-CNN+IterDet的权重系数实验结果

Table 5. Experimental results of weight coefficient based on Faster R-CNN+IterDet on WiderPerson dataset

轻度遮挡权重系数	中度遮挡权重系数	重度遮挡权重系数	召回率	检测精度	平均重复检测框个数
			97.15	91.95	5.65
2	4	6	97.16	92.59	4.49
4	8	12	97.60	93.14	4.45
6	12	18	94.17	90.15	4.19
8	16	24	90.23	88.71	4.01
10	20	30	88.59	84.88	3.67

下载: 导出CSV

表 6 CrowdHuman数据集上的消融实验结果

Table 6. Ablation experimental results on CrowdHuman dataset

检测器	召回率		检测精度		平均重复检测框个数
检测器	IterDet	IterDet+HIFEM	IterDet	IterDet+HIFEM	IterDet	IterDet+HIFEM
RetinaNet	91.49	96.76	84.77	88.98	11.89	3.34
Faster R-CNN	95.80	97.10	88.08	91.10	8.45	2.21

下载: 导出CSV

表 7 WiderPerson数据集上的消融实验结果

Table 7. Ablation experimental results on WiderPerson dataset

检测器	召回率		检测精度		平均重复检测框个数
检测器	IterDet	IterDet+HIFEM	IterDet	IterDet+HIFEM	IterDet	IterDet+HIFEM
RetinaNet	95.35	97.60	90.23	94.70	8.66	4.12
Faster R-CNN	97.15	98.23	91.95	95.40	5.65	1.81

下载: 导出CSV

表 8 CrowdHuman数据集对比实验结果

Table 8. Results of comparison experiment on CrowdHuman dataset

检测器	召回率				检测精度
检测器	Baseline	PS-RCNN	IterDet	本文方法	Baseline	PS-RCNN	IterDet	本文方法
RetinaNet	93.80	91.49		97.40	80.83	84.77		89.73
Faster R-CNN	90.24	93.77	95.80	97.98	84.95	86.05	88.08	91.15

下载: 导出CSV

表 9 WiderPerson数据集对比实验结果

Table 9. Results of comparison experiment on WiderPerson dataset

检测器	召回率				检测精度
检测器	Baseline	PS-RCNN	IterDet	本文方法	Baseline	PS-RCNN	IterDet	本文方法
RetinaNet	90.20	95.35		98.87	89.12	90.23		95.99
Faster R-CNN	93.60	94.71	97.15	98.67	88.89	89.96	91.95	96.67

下载: 导出CSV

参考文献(25)

[1]	邱博, 刘翔, 石蕴玉, 等. 一种轻量化的多目标实时检测模型[J]. 北京航空航天大学学报, 2020, 46(9): 1778-1785. doi: 10.13700/j.bh.1001-5965.2020.0066 QIU B, LIU X, SHI Y Y, et al. A lightweight multi-target real-time detection model[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(9): 1778-1785(in Chinese). doi: 10.13700/j.bh.1001-5965.2020.0066
[2]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 580-587.
[3]	HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. doi: 10.1109/TPAMI.2015.2389824
[4]	GIRSHICK R. Fast R-CNN [C]//IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 1440-1448.
[5]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031
[6]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 779-788.
[7]	REDMON J, FARHADI A. YOLO9000: Better, faster, stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6517-6525.
[8]	REDMON J, FARHADI A. YOLOv3: An incremental improvement[EB/OL]. (2018-04-08)[2021-06-06],
[9]	BOCHKOVSKIY A, WANG C Y, LIAO H. YOLOv4: Optimal speed and accuracy of object detection[EB/OL]. (2020-04-23) [2021-06-06].
[10]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.
[11]	LAW H, DENG J. CornerNet: Detecting objects as paired keypoints[J]. International Journal of Computer Vision, 2020, 128(3): 642-656. doi: 10.1007/s11263-019-01204-1
[12]	ZHOU X Y, ZHUO J C, KRÄHENBÜHL P. Bottom-up object detection by grouping extreme and center points[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 850-859.
[13]	TIAN Z, SHEN C H, CHEN H, et al. FCOS: Fully convolutional one-stage object detection[C]//IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 9626-9635.
[14]	ZHANG S F, CHI C, YAO Y Q, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 9756-9765.
[15]	罗会兰, 陈鸿坤. 基于深度学习的目标检测研究综述[J]. 电子学报, 2020, 48(6): 1230-1239. doi: 10.3969/j.issn.0372-2112.2020.06.026 LUO H L, CHEN H K. Survey of object detection based on deep learning[J]. Acta Electronica Sinica, 2020, 48(6): 1230-1239(in Chinese). doi: 10.3969/j.issn.0372-2112.2020.06.026
[16]	GE Z, JIE Z, HUANG X, et al. PS-RCNN: Detecting secondary human instances in a crowd via primary object suppression [EB/OL]. (2020-03-16) [2021-06-06].
[17]	RUKHOVICH D, SOFIIUK K, GALEEV D, et al. IterDet: Iterative scheme for object detection in crowded environments[C]//Structural, Syntactic, and Statistical Pattern Recognition. Beilin: Springer, 2021: 344-354.
[18]	王海, 王宽, 蔡英凤, 等. 基于改进级联卷积神经网络的交通标志识别[J]. 汽车工程, 2020, 42(9): 1256-1262. doi: 10.19562/j.chinasae.qcgc.2020.09.016 WANG H, WANG K, CAI Y F, et al. Traffic sign recognition based on improved cascade convolution neural network[J]. Automotive Engineering, 2020, 42(9): 1256-1262(in Chinese). doi: 10.19562/j.chinasae.qcgc.2020.09.016
[19]	郑浦, 白宏阳, 李伟, 等. 复杂背景下的小目标检测算法[J]. 浙江大学学报(工学版), 2020, 54(9): 1777-1784. doi: 10.3785/j.issn.1008-973X.2020.09.014 ZHENG P, BAI H Y, LI W, et al. Small target detection algorithm in complex background[J]. Journal of Zhejiang University (Engineering Science), 2020, 54(9): 1777-1784(in Chinese). doi: 10.3785/j.issn.1008-973X.2020.09.014
[20]	马立, 巩笑天, 欧阳航空. Tiny YOLOV3目标检测改进[J]. 光学精密工程, 2020, 28(4): 988-995. MA L, GONG X T, OUYANG H K. Improvement of Tiny YOLOV3 target detection[J]. Optics and Precision Engineering, 2020, 28(4): 988-995(in Chinese).
[21]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 936-944.
[22]	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8759-8768.
[23]	PANG J M, CHEN K, SHI J P, et al. Libra R-CNN: Towards balanced learning for object detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 821-830.
[24]	SHAO S, ZHAO Z J, LI B X, et al. CrowdHuman: A benchmark for detecting human in a crowd[EB/OL]. (2018-04-30)[2021-06-06].
[25]	ZHANG S F, XIE Y L, WAN J, et al. WiderPerson: A diverse dataset for dense pedestrian detection in the wild[J]. IEEE Transactions on Multimedia, 2020, 22(2): 380-393. doi: 10.1109/TMM.2019.2929005