-
摘要:
现有的行人重识别方法主要关注于学习行人的局部特征来实现跨摄像机条件下的行人辨识。然而在人体部件存在运动或遮挡、背景干扰等行人数据非完备条件下,会导致行人局部辨识信息丢失概率的增加。针对这个问题,提出了一种多尺度联合学习方法对行人辨识特征进行精细化表达。该方法包含3个分支网络,分别提取行人的粗粒度全局特征、细粒度全局特征和细粒度局部特征。其中粗粒度全局分支通过融合不同层次的语义信息来增强全局特征的丰富性;细粒度全局分支通过联合全部局部特征,在对全局特征进行细粒度描述的同时学习行人局部部件间的相关性;细粒度局部分支则通过遍历局部特征来挖掘行人非显著性的信息以增强局部特征的鲁棒性。为了验证所提方法的有效性,在Market1501、DukeMTMC-ReID和CUHK03三个公开数据集上开展了对比实验,实验结果表明:所提方法取得了最佳性能。
Abstract:The existing person re-identification approaches mainly focus on learning person's local features to match a specific pedestrian across different cameras. However, in the presence of incomplete conditions of pedestrian data such as motion or occlusion of human body parts, background interference, etc., it leads to an increase in the probability of partial loss of pedestrian recognition information. This paper presents a multi-scale joint learning method to extract the fine-grained person feature. This method consists of three subnets, i.e. coarse-grained global feature extraction subnet, fine-grained global feature extraction subnet, and fine-grained local feature extraction subnet. The coarse-grained global feature extraction subnet enhances the diversity of the global feature by fusing semantic information at different levels. The fine-grained global branching unites all local features to learn the correlation among local components of a pedestrian while describing the global features at a fine-grained level. The fine-grained local feature extraction subnet enhances robustness by traversing local features and finding out pedestrian non-significant information. Comparative experiments have been conducted to evaluate the performance of the proposed method against state-of-the-art methods on Market1501, DukeMTMC-ReID, and CUHK03 person re-identification datasets. The experimental results show that the proposed method has the best performance.
-
Key words:
- person re-identification /
- multi-scale /
- joint learning /
- multi-branch network /
- deep learning
-
表 1 多尺度联合学习方法和其他方法性能对比
Table 1. Performance comparison of multi-scale joint learning method and other methods
% 方法 CUHK03 Market1501 DukeMTMC-ReID Labeled Detected Rank-1 mAP Rank-1 mAP Rank-1 mAP Rank-1 mAP 基于部件 IDE[15] 22.0 21.0 21.3 19.7 72.5 46.0 67.7 47.1 MGN[8] 68.0 67.4 66.8 66.0 95.7 86.9 88.7 78.4 PCB[3] 61.9 56.8 60.6 54.4 92.3 77.4 81.7 66.1 Pyramid[6] 78.9 76.9 78.9 74.8 95.7 88.2 89.0 79.0 GFLF-S[34] 76.6 73.5 74.4 69.6 94.8 88.0 89.3 77.1 基于注意力机制 CASN[35] 73.7 68.0 71.5 64.4 94.4 82.8 87.7 73.7 M1tB[36] 70.1 66.5 66.6 64.2 94.7 84.5 85.8 72.9 Mancs[37] 69.0 63.9 65.5 60.5 93.1 82.3 84.9 71.8 HACNN[24] 44.4 41.0 41.7 38.6 91.2 75.7 80.5 63.9 其他 DPFL[38] 43.0 40.5 40.7 37.0 88.9 73.1 79.2 60.0 BDB[39] 73.6 71.7 72.8 69.3 94.2 84.3 86.8 72.1 SVDNet[40] 40.9 37.8 41.5 37.3 82.3 62.1 76.7 56.8 本文 多尺度联合 80.7 77.0 78.0 73.4 95.9 89.1 90.0 80.4 表 2 多尺度联合学习方法消融实验
Table 2. Ablation experiment of multi-scale joint learning method
% 方法 CUHK03 Market1501 DukeMTMC-ReID Labeled Detected Rank-1 mAP Rank-1 mAP Rank-1 mAP Rank-1 mAP 基线 59.1 54.2 55.1 50.2 93.5 82.4 85.3 72.0 基线+CG 69.8 66.1 66.9 62.6 94.8 86.9 87.9 76.7 基线+FG 70.9 67.1 68.2 63.3 95.1 87.3 88.2 77.9 基线+CG+FG 76.4 72.1 73.0 68.4 95.3 88.7 88.7 79.1 基线+CG+ FP1 78.4 75.1 76.0 72.2 95.6 88.7 89.1 78.5 基线+CG+ FP2 78.7 75.2 75.5 71.5 95.6 89.0 89.2 79.6 基线+ FG + FP1 77.3 73.1 76.4 71.8 95.6 88.5 89.5 78.9 基线+ FG + FP2 77.6 74.2 75.0 71.4 95.7 88.8 89.5 79.8 基线+CG+FG+FP1 80.7 77.0 78.0 73.4 95.9 88.8 89.6 79.2 基线+CG+FG+FP2 80.8 76.7 76.0 71.8 95.9 89.1 90.0 80.4 -
[1] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. doi: 10.1038/nature14539 [2] ZHAO H, TIAN M, SUN S, et al. Spindle Net: Person re-identification with human body region guided feature decomposition and fusion[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 907-915. [3] SUN Y, ZHENG L, YANG Y, et al. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 480-496. [4] ZHENG L, HUANG Y, LU H, et al. Pose-invariant embedding for deep person re-identification[J]. IEEE Transactions on Image Processing, 2019, 28(9): 4500-4509. doi: 10.1109/TIP.2019.2910414 [5] WEI L, ZHANG S, YAO H, et al. GLAD: Global-local-alignment descriptor for pedestrian retrieval[C]//Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM Press, 2017: 420-428. [6] ZHENG F, DENG C, SUN X, et al. Pyramidal person re-identification via multi-loss dynamic training[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 8514-8522. [7] FU Y, WEI Y, ZHOU Y, et al. Horizontal pyramid matching for person re-identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 8295-8302. [8] WANG G, YUAN Y, CHEN X, et al. Learning discriminative features with multiple granularities for person re-identification[C]//Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM Press, 2018: 274-282. [9] WANG Z, JIANG J, WU Y, et al. Learning sparse and identity-preserved hidden attributes for person re-identification[J]. IEEE Transactions on Image Processing, 2019, 29(1): 2013-2025. [10] ZENG Z, WANG Z, WANG Z, et al. Illumination-adaptive person re-identification[J]. IEEE Transactions on Multimedia, 2020, 22(12): 3064-3074. doi: 10.1109/TMM.2020.2969782 [11] WANG Z, WANG Z, ZHENG Y, et al. Beyond intra-modality: A survey of heterogeneous person re-identification[EB/OL]. (2020-04-27)[2020-07-23]. https://arxiv.org/abs/1905.10048v4. [12] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2005: 886-893. [13] LIAO S, HU Y, ZHU X, et al. Person re-identification by local maximal occurrence representation and metric learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 2197-2206. [14] KOESTINGER M, HIRZER M, WOHLHART P, et al. Large scale metric learning from equivalence constraints[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2012: 2288-2295. [15] ZHENG L, YANG Y, HAUPTMANN A G. Person re-identification: Past, present and future[EB/OL]. [2020-07-23]. https://arxiv.org/abs/1610.02984. [16] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778. [17] SU C, LI J, ZHANG S, et al. Pose-driven deep convolutional model for person re-identification[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 3980-3989. [18] SUH Y, WANG J, TANG S, et al. Part-aligned bilinear representations for person re-identification[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 418-437. [19] XU J, ZHAO R, ZHU F, et al. Attention-aware compositional network for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2119-2128. [20] SARFRAZ M S, SCHUMANN A, EBERLE A, et al. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 420-429. [21] ZHENG W S, LI X, XIANG T, et al. Partial person re-identification[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 4678-4686. [22] LUO H, JIANG W, ZHANG X, et al. AlignedReID++: Dynamically matching local information for person re-identification[J]. Pattern Recognition, 2019, 94: 53-61. doi: 10.1016/j.patcog.2019.05.028 [23] SUN Y, XU Q, LI Y, et al. Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 393-402. [24] LI W, ZHU X, GONG S. Harmonious attention network for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2285-2294. [25] LIU X, ZHAO H, TIAN M, et al. HydraPlus-Net: Attentive deep features for pedestrian analysis[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 350-359. [26] SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: A unified embedding for face recognition and clustering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 815-823. [27] WANG F, XIANG X, CHENG J, et al. Normface: L2 hypersphere embedding for face verification[C]//Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM Press, 2017: 1041-1049. [28] HERMANS A, BEYER L, LEIBE B. In defense of the triplet loss for person re-identification[EB/OL]. (2017-11-17)[2020-07-23]. https://arxiv.org/abs/1703.07737. [29] ZHENG L, SHEN L, TIAN L, et al. Scalable person re-identification: A benchmark[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1116-1124. [30] LI W, ZHAO R, XIAO T, et al. DeepReID: Deep filter pairing neural network for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 152-159. [31] ZHONG Z, ZHENG L, CAO D, et al. Re-ranking person re-identification with k-reciprocal encoding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1318-1327. [32] ZHENG Z, ZHENG L, YANG Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 3754-3762. [33] DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2009: 248-255. [34] PARK H, HAM B. Relation network for person re-identification[EB/OL]. (2017-08-22)[2020-07-23]. https://arxiv.org/abs/1701.07717. [35] ZHENG M, KARANAM S, WU Z, et al. Re-identification with consistent attentive siamese networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 5735-5744. [36] YANG W, HUANG H, ZHANG Z, et al. Towards rich feature discovery with class activation maps augmentation for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1389-1398. [37] WANG C, ZHANG Q, HUANG C, et al. Mancs: A multi-task attentional network with curriculum sampling for person re-identification[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 365-381. [38] CHEN Y, ZHU X, GONG S. Person re-identification by deep learning multi-scale representations[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2590-2600. [39] DAI Z, CHEN M, GU X, et al. Batch DropBlock network for person re-identification and beyond[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 3690-3700. [40] SUN Y, ZHENG L, DENG W, et al. SVDNet for pedestrian retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 3820-3828.