-
摘要:
针对行人特征表达不充分的问题,提出了一种基于通道注意力机制的行人重识别方法。将通道注意力机制SE模块嵌入到骨干网络ResNet50中,对关键特征信息进行加权强化;采用动态激活函数,根据输入特征动态调整ReLU的参数,增强网络模型的非线性表达能力;将梯度中心化算法引入Adam优化器,提升网络模型的训练速度和泛化能力。在Market1501、DukeMTMC-ReID和CUHK03主流数据集上对改进后的模型进行测试评价,Rank-1分别提升2.17%、2.38%和3.50%,mAP分别提升3.07%、3.39%和4.14%。结果表明:改进后的模型能够提取更强鲁棒性的行人表达特征,达到更高的识别精度。
Abstract:To address the problem of insufficient expression of pedestrian characteristics, we propose a pedestrian re-identification method based on channel attention mechanism. The channel attention mechanism named SE module is embedded in the backbone network ResNet50 to weight and strengthen the key feature information. The dynamic activation function is used to dynamically adjust the parameters of ReLU according to the input characteristics, and enhance the nonlinear expression ability of the network model. The gradient centralization algorithm is introduced into the Adam optimizer to improve the training speed and generalization ability of the network model. Experiments on the three mainstream datasets: Market1501, DukeMTMC-ReID and CUHK03 show that Rank-1 is increased by 2.17%, 2.38%, and 3.50% respectively, and mAP is increased by 3.07%, 3.39%, and 4.14% respectively. The results indicate that our approach can extract more robust pedestrian expression features and achieve higher recognition accuracy.
-
表 1 数据集情况
Table 1. Datasets condition
数据集 公开年份 摄像机个数 ID数 图像数量/张 Market1501 2015 6 1 501 32 668 DukeMTMC-ReID 2017 8 1 404 36 411 CUHK03 2014 10 1 467 13 164 表 2 嵌入SE模块后模型在数据集Market1501、DukeMTMC-ReID和CUHK03上的性能比较
Table 2. Performance comparison of models embedded with SE module on Market1501, DukeMTMC-ReID, and CUHK03 datasets
% 方法 Market1501 DukeMTMC-ReID CUHK03 Rank-1 mAP Rank-1 mAP Rank-1 mAP 基线 90.23 76.85 81.60 65.11 58.50 56.34 基线+SE 91.15 77.85 82.81 67.17 60.07 58.23 表 3 改进激活函数后模型在数据集Market1501、DukeMTMC-ReID和CUHK03上的性能比较
Table 3. Performance comparison of models with improved activation functions on Market1501, DukeMTMC-ReID, and CUHK03 datasets
% 方法 Market1501 DukeMTMC-ReID CUHK03 Rank-1 mAP Rank-1 mAP Rank-1 mAP 基线 90.23 76.85 81.60 65.11 58.50 56.34 基线+DyReLU 91.21 78.90 83.75 67.66 61.50 59.01 表 4 引入梯度中心化后模型在数据集Market1501、DukeMTMC-ReID和CUHK03上的性能比较
Table 4. Performance comparison of models with gradient centralization on Market1501, DukeMTMC-ReID, and CUHK03 datasets
% 方法 Market1501 DukeMTMC-ReID CUHK03 Rank-1 mAP Rank-1 mAP Rank-1 mAP 基线 90.23 76.85 81.60 65.11 58.50 56.34 基线+GC 90.38 77.05 82.05 66.28 59.71 56.63 表 5 在数据集Market1501、DukeMTMC-ReID和CUHK03上的消融实验结果
Table 5. Results of ablation experiments on Market1501, DukeMTMC-ReID, and CUHK03 datasets
% 方法 Market1501 DukeMTMC-ReID CUHK03 Rank-1 mAP Rank-1 mAP Rank-1 mAP 基线 90.23 76.85 81.60 65.11 58.50 56.34 基线+SE 91.15 77.85 82.81 67.17 60.07 58.23 基线+DyReLU 91.21 78.90 83.75 67.66 61.50 59.01 基线+GC 90.38 77.05 82.05 66.28 59.71 56.63 基线+SE+DyReLU 91.39 78.57 83.62 67.55 58.93 56.62 基线+SE+GC 90.68 78.15 83.39 67.72 59.93 57.79 基线+DyReLU+GC 92.40 79.92 83.98 68.50 62.00 60.48 基线+SE+DyReLU+GC 91.09 78.31 83.39 68.03 59.86 58.58 表 6 不同模型在数据集Market1501、DukeMTMC-ReID和CUHK03上的性能比较
Table 6. Performance comparison of different models on Market1501, DukeMTMC-ReID, and CUHK03 datasets
% 方法 Market1501 DukeMTMC-ReID CUHK03 Rank-1 mAP Rank-1 mAP Rank-1 mAP SVDnet[20] 82.3 62.1 76.7 56.8 41.5 37.3 PAN[21] 82.81 63.35 71.59 51.51 36.29 34.00 PDC[22] 84.14 63.41 78.29 AACN[23] 85.90 66.87 76.84 59.25 79.14 78.37 GLAD[24] 89.9 73.9 82.2 HA-CNN[25] 91.2 75.7 80.5 63.8 41.7 38.6 SCPNet 90.23 76.85 81.60 65.11 58.50 56.34 SCPNet+DyReLU+GC 92.40 79.92 83.98 68.50 62.00 60.48 SCPNet+SE+DyReLU+GC 91.09 78.31 83.39 68.03 59.86 58.58 表 7 泛化能力比较
Table 7. Comparison of generalization ability
% 方法 Market1501 DukeMTMC-ReID CUHK03 Rank-1 mAP Rank-1 mAP Rank-1 mAP SVDnet[20] 82.3 62.1 76.7 56.8 41.5 37.3 SVDnet+SE+DyReLU+GC 82.9 63.3 77.8 58.9 42.3 38.7 PAN[21] 82.81 63.35 71.59 51.51 36.29 34.00 PAN+SE+DyReLU+GC 83.43 64.59 72.81 53.62 37.19 35.52 SCPNet 90.23 76.85 81.60 65.11 58.50 56.34 SCPNet+SE+DyReLU+GC 91.09 78.31 83.39 68.03 59.86 58.58 -
[1] YE M, SHEN J, LIN G, et al. Deep learning for person re-identification: A survey and outlook[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021(2021-01-26)[2021-02-01]. https://ieeexplore.ieee.org/document/9336268. [2] ZHENG Z, ZHENG L, YANG Y. A discriminatively learned CNN embedding for person reidentification[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2017, 14(1): 1-20. [3] WANG F, ZUO W, LIN L, et al. Joint learning of single-image and cross-image representations for person re-identification[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1288-1296. [4] SUN Y, ZHENG L, YANG Y, et al. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 480-496. [5] ZHAO H, TIAN M, SUN S, et al. Spindle Net: Person re-identification with human body region guided feature decomposition and fusion[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1077-1085. [6] MIAO J, WU Y, LIU P, et al. Pose-guided feature alignment for occluded person re-identification[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 542-551. [7] LIN Y, ZHENG L, ZHENG Z, et al. Improving person re-identification by attribute and identity learning[J]. Pattern Recognition, 2019, 95: 151-161. doi: 10.1016/j.patcog.2019.06.006 [8] FAN X, LUO H, ZHANG X, et al. SCPNet: Spatial-channel parallelism network for joint holistic and partial person re-identification[C]//Asian Conference on Computer Vision. Berlin: Springer, 2018: 19-34. [9] HERMANS A, BEYER L, LEIBE B. In defense of the triplet loss for person re-identification[EB/OL]. (2017-11-21)[2020-12-01]. https://arxiv.org/abs/1703.07737. [10] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141. [11] CHEN Y, DAI X, LIU M, et al. Dynamic ReLU[C]//European Conference on Computer Vision. Berlin: Springer, 2020: 351-367. [12] IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning. New York: ACM, 2015: 448-456. [13] QIAO S, WANG H, LIU C, et al. Weight standardization[EB/OL]. (2020-08-09)[2020-12-01]. https://arxiv.org/abs/1903.10520. [14] YONG H, HUANG J, HUA X, et al. Gradient centralization: A new optimization technique for deep neural networks[C]//European Conference on Computer Vision. Berlin: Springer, 2020: 635-652. [15] ZHENG L, SHEN L, TIAN L, et al. Scalable person re-identification: A benchmark[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1116-1124. [16] ZHENG Z, ZHENG L, YANG Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro[C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 3754-3762. [17] LI W, ZHAO R, XIAO T, et al. DeepReID: Deep filter pairing neural network for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 152-159. [18] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778. [19] CHEN Y C, ZHENG W S, LAI J H, et al. An asymmetric distance model for cross-view feature mapping in person reidentification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 27(8): 1661-1675. [20] SUN Y, ZHENG L, DENG W, et al. Svdnet for pedestrian retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 3800-3808. [21] ZHENG Z, ZHENG L, YANG Y. Pedestrian alignment network for large-scale person re-identification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 29(10): 3037-3045. [22] SU C, LI J, ZHANG S, et al. Pose-driven deep convolutional model for person re-identification[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 3960-3969. [23] XU J, ZHAO R, ZHU F, et al. Attention-aware compositional network for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2119-2128. [24] WEI L, ZHANG S, YAO H, et al. GLAD: Global-local-alignment descriptor for pedestrian retrieval[C]//Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM, 2017: 420-428. [25] LI W, ZHU X, GONG S. Harmonious attention network for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2285-2294.