Citation: | CHEN Lin, GAO Zan, SONG Xuemeng, et al. A cross-modal pedestrian Re-ID algorithm based on dual attribute information[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(4): 647-656. doi: 10.13700/j.bh.1001-5965.2020.0614(in Chinese) |
Through the investigation of cross-modal retrieval, the use of attribute information can enhance the semantic representation of extracted features. The attributes of the pedestrian image and text are not used adequately in the existing cross-modal pedestrian Re-ID algorithms based on natural language. To tackle the above issues, a novel cross-modal pedestrian Re-ID algorithm based on dual attribute information is proposed. Specifically, the attribute information of the pedestrian image and the attribute information of pedestrian text descriptions are fully and simultaneously explored, and the dual attribute space is also built to improve the distinguishability and semantic expression of extracted image and text features. Extensive experimental results on a public cross-modal pedestrian Re-ID dataset CUHK-PEDES demonstrate that the proposed algorithm is comparable with state-of-the-art algorithm CMAAM (Top-1 56.68%), the retrieval accuracy Top-1 of the proposed algorithm reaches 56.42%, and Top-5 and Top-10 are improved by 0.45% and 0.29% respectively. Besides, the retrieval accuracy of cross-modal pedestrian images can be significantly improved if the class information is provided in the gallery image pool and is used to extract attribute features, and Top-1 can reach 64.88%. The ablation study also proves the importance of the text attribute and image attribute used by the proposed algorithm and the effectiveness of the dual attribute space.
[1] |
ZHENG L, YANG Y, HAUPTMANN A G. Person re-identification: Past, present and future[EB/OL]. (2016-10-11)[2020-10-30]. http://arxiv.org/abs/1610.02984.
|
[2] |
罗浩, 姜伟, 范星, 等. 基于深度学习的行人重识别研究进展[J]. 自动化学报, 2019, 45(11): 2032-2049. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201911002.htm
LUO H, JIANG W, FAN X, et al. A survey on deep learning based person re-identification[J]. Acta Automatica Sinica, 2019, 45(11): 2032-2049(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201911002.htm
|
[3] |
YE M, SHEN J, LIN G, et al. Deep learning for person re-identification: A survey and outlook. (2020-01-13)[2020-10-30]. https://arxiv.org/abs/2001.04193v1.
|
[4] |
LI S, XIAO T, LI H, et al. Person search with natural language description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1970-1979.
|
[5] |
JI G, LI S J, PANG Y. Fusion-attention network for person search with free-form natural language[J]. Pattern Recognition Letters, 2018, 116: 205-211. doi: 10.1016/j.patrec.2018.10.020
|
[6] |
CHEN T, XU C, LUO J. Improving text-based person search by spatial matching and adaptive threshold[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2018: 1879-1887.
|
[7] |
CHEN D, LI H, LIU X, et al. Improving deep visual representation for person re-identification by global and local image-language association[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 56-73.
|
[8] |
WANG Y, BO C, WANG D, et al. Language person search with mutually connected classification loss[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2019: 2057-2061.
|
[9] |
ZHANG Y, LU H. Deep cross-modal projection learning for image-text matching[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 707-723.
|
[10] |
AGGARWAL S, BABU R V, CHAKRABORTY A. Text-based person search via attribute-aided matching[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2020: 2617-2625.
|
[11] |
LI D, CHEN X, HUANG K Q. Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios[C]//Proceedings of the Asian Conference on Pattern Recognition. Piscataway: IEEE Press, 2015: 111-115.
|
[12] |
DENG Y, LUO P, LOY C C, et al. Pedestrian attribute recognition at far distance[C]// Proceedings of the ACM International Conference on Multimedia. New York: ACM, 2014: 789-792.
|
[13] |
WANG J, ZHU X, GONG S, et al. Attribute recognition by joint recurrent learning of context and correlation[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 17467750.
|
[14] |
LIN Y, ZHENG L, ZHENG Z, et al. Improving person re-identification by attribute and identity learning[J]. Pattern Recognition, 2019, 95: 151-161. doi: 10.1016/j.patcog.2019.06.006
|
[15] |
MATSUKAWA T, SUZUKI E. Person re-identification using CNN features learned from combination of attributes[C]//Proceedings of the IEEE International Conference on Pattern Recognition. Piscataway: IEEE Press, 2016: 2428-2433.
|
[16] |
CHEN W, CHEN X, ZHANG J, et al. Beyond triplet loss: A deep quadruplet network for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 17355323.
|
[17] |
ALEXANDER H, LUCAS B, BASTIAN L. In defense of the triplet loss for person reidentification[EB/OL]. (2017-03-22)[2020-10-30]. https://arxiv.org/abs/1703.07737.
|
[18] |
YIN J, WU A, ZHENG W. Fine-grained person re-identification[J]. International Journal of Computer Vision, 2020, 128: 1654-1672. doi: 10.1007/s11263-019-01259-0
|
[19] |
GAO Z, GAO L S, ZHANG H, et al. Deep spatial pyramid feature collaborative reconstruction for partial person reid[C]//Proceedings of the ACM International Conference on Multimedia. New York: ACM, 2019: 1879-1887.
|
[20] |
ZHENG Z, YANG X, YU Z, et al. Joint discriminative and generative learning for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1335-1344.
|
[21] |
YANG Q, WU A, ZHENG W, et al. Person re-identification by contour sketch under moderate clothing change[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(6): 2029-2046. doi: 10.1109/TPAMI.2019.2960509
|
[22] |
WANG B, YANG Y, XU X, et al. Adversarial cross-modal retrieval[C]//Proceedings of the ACM International Conference on Multimedia. New York: ACM, 2017: 154-162.
|
[23] |
JING Y, SI C, WANG J, et al. Pose-guided joint global and attentive local matching network for text-based person search[EB/OL]. (2018-09-22)[2020-10-30]. https://arxiv.org/abs/1809.08440v2.
|
[24] |
LOPER E, KLEIN E, BIRD S. Natural language processing with python-natural language toolkit[CP/OL]. (2019-09-04)[2020-10-30]. http://www.nltk.org/book/.
|
[25] |
LIU X, ZHAO H, TIAN M, et al. HydraPlus-Net: Attentive deep features for pedestrian analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 350-359.
|
[26] |
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
|
[27] |
LUO Y, ZHENG Z, ZHENG L, et al. Macro-micro adversarial network for human parsing[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 424-440.
|
[28] |
SUN B, SAENKO K. Deep coral: Correlation alignment for deep domain adaptation[C]//Proceedings of the European Conference on Computer Vision. Berlin, German: Springer, 2016: 443-450.
|
[29] |
SHI B, JI L, LU P, et al. Knowledge aware semantic concept expansion for image-text matching[C]//Proceedings of the International Joint Conference on Artificial Intelligence. San Francisco: Margan Kaufmann, 2019: 5182-5189.
|
[30] |
HUANG Y, WU Q, SONG C, et al. Learning semantic concepts and order for image and sentence matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6163-6171.
|
[31] |
KINGMA D P, BA J. Adam: A method for stochastic optimization[EB/OL]. (2017-01-30)[2020-10-30]. https://arxiv.org/abs/1412.6980.
|
[32] |
LI S, XIAO T, LI H, et al. Identity-aware textual-visual matching with latent co-attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1908-1917.
|
[33] |
ZHENG Z, ZHENG L, GARRETT M, et al. Dual-path convolutional image-text embeddings with instance loss[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2020, 16(2): 1-23.
|