基于多尺度联合学习的行人重识别

谢彭宇; 徐新

doi:10.13700/j.bh.1001-5965.2020.0445

基于多尺度联合学习的行人重识别

doi: 10.13700/j.bh.1001-5965.2020.0445

谢彭宇¹,
徐新^{1, 2, 3, ,}

1.
武汉科技大学计算机科学与技术学院, 武汉 430065
2.
武汉科技大学智能信息处理与实时工业系统湖北省重点实验室, 武汉 430065
3.
上海交通大学电子信息与电气工程学院, 上海 200240

基金项目:

国家自然科学基金 U1803262

国家自然科学基金 61602349

国家自然科学基金 61440016

详细信息

作者简介:
谢彭宇男, 硕士研究生。主要研究方向: 计算机视觉、行人重识别; 徐新, 男, 博士, 教授, 博士生导师。主要研究方向: 计算机视觉、机器学习、行人重识别

徐新男，博士，教授，博士生导师。主要研究方向：计算机视觉、机器学习、行人重识别

通讯作者:
徐新, E-mail: xuxin0336@163.com

中图分类号: TP391
计量
- 文章访问数: 618
- HTML全文浏览量: 134
- PDF下载量: 95
- 被引次数: 0
出版历程
- 收稿日期: 2020-08-24
- 录用日期: 2020-09-04
- 网络出版日期: 2021-03-20

Multi-scale joint learning for person re-identification

XIE Pengyu¹,
XU Xin^{1, 2, 3
, ,}

1.
School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China
2.
Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology, Wuhan 430065, China
3.
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Funds:

National Natural Science Foundation of China U1803262

National Natural Science Foundation of China 61602349

National Natural Science Foundation of China 61440016

More Information

Corresponding author: XU Xin, E-mail: xuxin0336@163.com

摘要

摘要:
现有的行人重识别方法主要关注于学习行人的局部特征来实现跨摄像机条件下的行人辨识。然而在人体部件存在运动或遮挡、背景干扰等行人数据非完备条件下，会导致行人局部辨识信息丢失概率的增加。针对这个问题，提出了一种多尺度联合学习方法对行人辨识特征进行精细化表达。该方法包含3个分支网络，分别提取行人的粗粒度全局特征、细粒度全局特征和细粒度局部特征。其中粗粒度全局分支通过融合不同层次的语义信息来增强全局特征的丰富性；细粒度全局分支通过联合全部局部特征，在对全局特征进行细粒度描述的同时学习行人局部部件间的相关性；细粒度局部分支则通过遍历局部特征来挖掘行人非显著性的信息以增强局部特征的鲁棒性。为了验证所提方法的有效性，在Market1501、DukeMTMC-ReID和CUHK03三个公开数据集上开展了对比实验，实验结果表明：所提方法取得了最佳性能。
- 行人重识别 /
- 多尺度 /
- 联合学习 /
- 多分支网络 /
- 深度学习
Abstract:
The existing person re-identification approaches mainly focus on learning person's local features to match a specific pedestrian across different cameras. However, in the presence of incomplete conditions of pedestrian data such as motion or occlusion of human body parts, background interference, etc., it leads to an increase in the probability of partial loss of pedestrian recognition information. This paper presents a multi-scale joint learning method to extract the fine-grained person feature. This method consists of three subnets, i.e. coarse-grained global feature extraction subnet, fine-grained global feature extraction subnet, and fine-grained local feature extraction subnet. The coarse-grained global feature extraction subnet enhances the diversity of the global feature by fusing semantic information at different levels. The fine-grained global branching unites all local features to learn the correlation among local components of a pedestrian while describing the global features at a fine-grained level. The fine-grained local feature extraction subnet enhances robustness by traversing local features and finding out pedestrian non-significant information. Comparative experiments have been conducted to evaluate the performance of the proposed method against state-of-the-art methods on Market1501, DukeMTMC-ReID, and CUHK03 person re-identification datasets. The experimental results show that the proposed method has the best performance.
- person re-identification /
- multi-scale /
- joint learning /
- multi-branch network /
- deep learning

HTML全文

图 1 真实场景下受遮挡的相似行人图像

Figure 1. Obscured images of similar pedestrians in real scenes

下载: 全尺寸图片幻灯片

图 2 多尺度联合学习网络框架

Figure 2. Multi-scale joint learning network framework

下载: 全尺寸图片幻灯片

图 3 细粒度局部分支

Figure 3. Fine-grained local branch

下载: 全尺寸图片幻灯片

图 4 Market1501数据集部分图像查询结果

Figure 4. Partial image query results on Market1501 dataset

下载: 全尺寸图片幻灯片

图 5 Market1501数据集部分图像热力图

Figure 5. Partial image heatmap on Market1501 dataset

下载: 全尺寸图片幻灯片

表 1 多尺度联合学习方法和其他方法性能对比

Table 1. Performance comparison of multi-scale joint learning method and other methods %

方法		CUHK03				Market1501		DukeMTMC-ReID
		Labeled		Detected		Market1501		DukeMTMC-ReID
		Rank-1	mAP	Rank-1	mAP	Rank-1	mAP	Rank-1	mAP
基于部件	IDE^[15]	22.0	21.0	21.3	19.7	72.5	46.0	67.7	47.1
	MGN^[8]	68.0	67.4	66.8	66.0	95.7	86.9	88.7	78.4
	PCB^[3]	61.9	56.8	60.6	54.4	92.3	77.4	81.7	66.1
	Pyramid^[6]	78.9	76.9	78.9	74.8	95.7	88.2	89.0	79.0
	GFLF-S^[34]	76.6	73.5	74.4	69.6	94.8	88.0	89.3	77.1
基于注意力机制	CASN^[35]	73.7	68.0	71.5	64.4	94.4	82.8	87.7	73.7
	M1tB^[36]	70.1	66.5	66.6	64.2	94.7	84.5	85.8	72.9
	Mancs^[37]	69.0	63.9	65.5	60.5	93.1	82.3	84.9	71.8
	HACNN^[24]	44.4	41.0	41.7	38.6	91.2	75.7	80.5	63.9
其他	DPFL^[38]	43.0	40.5	40.7	37.0	88.9	73.1	79.2	60.0
	BDB^[39]	73.6	71.7	72.8	69.3	94.2	84.3	86.8	72.1
	SVDNet^[40]	40.9	37.8	41.5	37.3	82.3	62.1	76.7	56.8
本文	多尺度联合	80.7	77.0	78.0	73.4	95.9	89.1	90.0	80.4

下载: 导出CSV

表 2 多尺度联合学习方法消融实验

Table 2. Ablation experiment of multi-scale joint learning method %

方法	CUHK03				Market1501		DukeMTMC-ReID
	Labeled		Detected		Market1501		DukeMTMC-ReID
	Rank-1	mAP	Rank-1	mAP	Rank-1	mAP	Rank-1	mAP
基线	59.1	54.2	55.1	50.2	93.5	82.4	85.3	72.0
基线+CG	69.8	66.1	66.9	62.6	94.8	86.9	87.9	76.7
基线+FG	70.9	67.1	68.2	63.3	95.1	87.3	88.2	77.9
基线+CG+FG	76.4	72.1	73.0	68.4	95.3	88.7	88.7	79.1
基线+CG+ FP1	78.4	75.1	76.0	72.2	95.6	88.7	89.1	78.5
基线+CG+ FP2	78.7	75.2	75.5	71.5	95.6	89.0	89.2	79.6
基线+ FG + FP1	77.3	73.1	76.4	71.8	95.6	88.5	89.5	78.9
基线+ FG + FP2	77.6	74.2	75.0	71.4	95.7	88.8	89.5	79.8
基线+CG+FG+FP1	80.7	77.0	78.0	73.4	95.9	88.8	89.6	79.2
基线+CG+FG+FP2	80.8	76.7	76.0	71.8	95.9	89.1	90.0	80.4

下载: 导出CSV

参考文献(40)

[1]	LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. doi: 10.1038/nature14539
[2]	ZHAO H, TIAN M, SUN S, et al. Spindle Net: Person re-identification with human body region guided feature decomposition and fusion[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 907-915.
[3]	SUN Y, ZHENG L, YANG Y, et al. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 480-496.
[4]	ZHENG L, HUANG Y, LU H, et al. Pose-invariant embedding for deep person re-identification[J]. IEEE Transactions on Image Processing, 2019, 28(9): 4500-4509. doi: 10.1109/TIP.2019.2910414
[5]	WEI L, ZHANG S, YAO H, et al. GLAD: Global-local-alignment descriptor for pedestrian retrieval[C]//Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM Press, 2017: 420-428.
[6]	ZHENG F, DENG C, SUN X, et al. Pyramidal person re-identification via multi-loss dynamic training[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 8514-8522.
[7]	FU Y, WEI Y, ZHOU Y, et al. Horizontal pyramid matching for person re-identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 8295-8302.
[8]	WANG G, YUAN Y, CHEN X, et al. Learning discriminative features with multiple granularities for person re-identification[C]//Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM Press, 2018: 274-282.
[9]	WANG Z, JIANG J, WU Y, et al. Learning sparse and identity-preserved hidden attributes for person re-identification[J]. IEEE Transactions on Image Processing, 2019, 29(1): 2013-2025.
[10]	ZENG Z, WANG Z, WANG Z, et al. Illumination-adaptive person re-identification[J]. IEEE Transactions on Multimedia, 2020, 22(12): 3064-3074. doi: 10.1109/TMM.2020.2969782
[11]	WANG Z, WANG Z, ZHENG Y, et al. Beyond intra-modality: A survey of heterogeneous person re-identification[EB/OL]. (2020-04-27)[2020-07-23]. https://arxiv.org/abs/1905.10048v4.
[12]	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2005: 886-893.
[13]	LIAO S, HU Y, ZHU X, et al. Person re-identification by local maximal occurrence representation and metric learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 2197-2206.
[14]	KOESTINGER M, HIRZER M, WOHLHART P, et al. Large scale metric learning from equivalence constraints[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2012: 2288-2295.
[15]	ZHENG L, YANG Y, HAUPTMANN A G. Person re-identification: Past, present and future[EB/OL]. [2020-07-23]. https://arxiv.org/abs/1610.02984.
[16]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
[17]	SU C, LI J, ZHANG S, et al. Pose-driven deep convolutional model for person re-identification[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 3980-3989.
[18]	SUH Y, WANG J, TANG S, et al. Part-aligned bilinear representations for person re-identification[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 418-437.
[19]	XU J, ZHAO R, ZHU F, et al. Attention-aware compositional network for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2119-2128.
[20]	SARFRAZ M S, SCHUMANN A, EBERLE A, et al. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 420-429.
[21]	ZHENG W S, LI X, XIANG T, et al. Partial person re-identification[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 4678-4686.
[22]	LUO H, JIANG W, ZHANG X, et al. AlignedReID++: Dynamically matching local information for person re-identification[J]. Pattern Recognition, 2019, 94: 53-61. doi: 10.1016/j.patcog.2019.05.028
[23]	SUN Y, XU Q, LI Y, et al. Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 393-402.
[24]	LI W, ZHU X, GONG S. Harmonious attention network for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2285-2294.
[25]	LIU X, ZHAO H, TIAN M, et al. HydraPlus-Net: Attentive deep features for pedestrian analysis[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 350-359.
[26]	SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: A unified embedding for face recognition and clustering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 815-823.
[27]	WANG F, XIANG X, CHENG J, et al. Normface: L2 hypersphere embedding for face verification[C]//Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM Press, 2017: 1041-1049.
[28]	HERMANS A, BEYER L, LEIBE B. In defense of the triplet loss for person re-identification[EB/OL]. (2017-11-17)[2020-07-23]. https://arxiv.org/abs/1703.07737.
[29]	ZHENG L, SHEN L, TIAN L, et al. Scalable person re-identification: A benchmark[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1116-1124.
[30]	LI W, ZHAO R, XIAO T, et al. DeepReID: Deep filter pairing neural network for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 152-159.
[31]	ZHONG Z, ZHENG L, CAO D, et al. Re-ranking person re-identification with k-reciprocal encoding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1318-1327.
[32]	ZHENG Z, ZHENG L, YANG Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 3754-3762.
[33]	DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2009: 248-255.
[34]	PARK H, HAM B. Relation network for person re-identification[EB/OL]. (2017-08-22)[2020-07-23]. https://arxiv.org/abs/1701.07717.
[35]	ZHENG M, KARANAM S, WU Z, et al. Re-identification with consistent attentive siamese networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 5735-5744.
[36]	YANG W, HUANG H, ZHANG Z, et al. Towards rich feature discovery with class activation maps augmentation for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1389-1398.
[37]	WANG C, ZHANG Q, HUANG C, et al. Mancs: A multi-task attentional network with curriculum sampling for person re-identification[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 365-381.
[38]	CHEN Y, ZHU X, GONG S. Person re-identification by deep learning multi-scale representations[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2590-2600.
[39]	DAI Z, CHEN M, GU X, et al. Batch DropBlock network for person re-identification and beyond[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 3690-3700.
[40]	SUN Y, ZHENG L, DENG W, et al. SVDNet for pedestrian retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 3820-3828.