基于双重属性信息的跨模态行人重识别算法

陈琳; 高赞; 宋雪萌; 王英龙; 聂礼强

doi:10.13700/j.bh.1001-5965.2020.0614

基于双重属性信息的跨模态行人重识别算法

doi: 10.13700/j.bh.1001-5965.2020.0614

1.
山东大学计算机科学与技术学院, 青岛 266200
2.
齐鲁工业大学(山东省科学院) 山东省人工智能研究院, 济南 250014

基金项目:

国家自然科学基金 61772310

国家自然科学基金 61702300

国家自然科学基金 U1936203

国家自然科学基金 61872270

国家重点研发计划 2018AAA0102502

山东省自然科学基金 ZR2019JQ23

山东省重大科技创新工程项目 2019JZZY010118

山东省高等学校青年创新团队发展计划 2020KJN012

济南市高校二十条创新团队 2018GXRC014

详细信息

通讯作者:
聂礼强, E-mail: nieliqiang@gmail.com

中图分类号: TP183
计量
- 文章访问数: 502
- HTML全文浏览量: 223
- PDF下载量: 122
- 被引次数: 0
出版历程
- 收稿日期: 2020-11-04
- 录用日期: 2020-11-12
- 网络出版日期: 2022-04-20

A cross-modal pedestrian Re-ID algorithm based on dual attribute information

1.
School of Computer Science and Technology, Shandong University, Qingdao 266200, China
2.
Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, China

Funds:

National Natural Science Foundation of China 61772310

National Natural Science Foundation of China 61702300

National Natural Science Foundation of China U1936203

National Natural Science Foundation of China 61872270

National Key R & D Program of China 2018AAA0102502

Shandong Provincial Natural Science Foundation ZR2019JQ23

Shandong Provincial Key Research and Development Program 2019JZZY010118

Young Creative Team in Universities of Shandong Province 2020KJN012

Innovation Teams in Colleges and Universities in Jinan 2018GXRC014

More Information

Corresponding author: NIE Liqiang, E-mail: nieliqiang@gmail.com

摘要

摘要:
通过对跨模态检索问题的研究，属性信息的使用可以增强所提取特征的语义表达性，但现有基于自然语言的跨模态行人重识别算法对行人图片和文本的属性信息利用不够充分。基于双重属性信息的跨模态行人重识别算法充分考虑了行人图片和文本描述的属性信息，构建了基于文本属性和图片属性的双重属性空间，并通过构建基于隐空间和属性空间的跨模态行人重识别端到端网络，提高了所提取图文特征的可区分性和语义表达性。跨模态行人重识别数据集CUHK-PEDES上的实验评估表明，所提算法的检索准确率Top-1达到了56.42%，与CMAAM算法的Top-1（56.68%）具有可比性，Top-5、Top-10相比CMAAM算法分别提升了0.45%、0.29%。针对待检索图片库中可能存在身份标签的应用场景，利用行人的类别信息提取属性特征，可以较大幅度提高跨模态行人图片的检索准确率，Top-1达到64.88%。消融实验证明了所提算法使用的文本属性和图片属性的重要性及基于双重属性空间的有效性。
- 跨模态检索 /
- 匹配算法 /
- 行人属性信息 /
- 特征表示 /
- 特征融合
Abstract:
Through the investigation of cross-modal retrieval, the use of attribute information can enhance the semantic representation of extracted features. The attributes of the pedestrian image and text are not used adequately in the existing cross-modal pedestrian Re-ID algorithms based on natural language. To tackle the above issues, a novel cross-modal pedestrian Re-ID algorithm based on dual attribute information is proposed. Specifically, the attribute information of the pedestrian image and the attribute information of pedestrian text descriptions are fully and simultaneously explored, and the dual attribute space is also built to improve the distinguishability and semantic expression of extracted image and text features. Extensive experimental results on a public cross-modal pedestrian Re-ID dataset CUHK-PEDES demonstrate that the proposed algorithm is comparable with state-of-the-art algorithm CMAAM (Top-1 56.68%), the retrieval accuracy Top-1 of the proposed algorithm reaches 56.42%, and Top-5 and Top-10 are improved by 0.45% and 0.29% respectively. Besides, the retrieval accuracy of cross-modal pedestrian images can be significantly improved if the class information is provided in the gallery image pool and is used to extract attribute features, and Top-1 can reach 64.88%. The ablation study also proves the importance of the text attribute and image attribute used by the proposed algorithm and the effectiveness of the dual attribute space.
- cross-modal retrieval /
- matching algorithm /
- pedestrian attribute information /
- feature representation /
- feature fusion

HTML全文

图 1 文本属性提取

Figure 1. Text attribute extraction

下载: 全尺寸图片幻灯片

图 2 图片属性识别结果

Figure 2. Result of image attribute recognition

下载: 全尺寸图片幻灯片

图 3 基于双重属性信息的跨模态行人重识别算法框架

Figure 3. Framework of cross-modal matching algorithm based on dual attribute information

下载: 全尺寸图片幻灯片

图 4 训练过程中损失函数变化情况

Figure 4. Change of loss function during training

下载: 全尺寸图片幻灯片

图 5 本文算法解决跨模态行人重识别的Top-5结果示例

Figure 5. Examples of Top-5 results for cross-modal pedestrian re-identification by proposed algorithm

下载: 全尺寸图片幻灯片

图 6 属性空间、隐空间及其融合后的检索结果

Figure 6. Retrieval results using attribute space and latent space and after fusion

下载: 全尺寸图片幻灯片

表 1 在CUHK-PEDES数据集上的性能评估及与现有算法的性能比较

Table 1. Performance evaluation on CUHK-PEDES dataset and comparison of performance with existing algorithms %

算法		Top-1	Top-5	Top-10
GNA-RNN^[4]		19.05		53.64
IATV^[32]		25.94		60.48
PWM+ATH^[6]		27.14	49.45	61.02
DPCE^[33]		44.40	66.26	75.07
GLA^[7]		43.58	66.93	76.26
CMPC+CMPM^[9]		49.37		79.27
MCCL^[8]		50.58		79.06
GALM^[23]		54.12	75.45	82.97
CMAAM^[10]		56.68	77.18	84.86
本文	无预处理	56.42	77.63	85.15
本文	预处理	64.88	85.02	90.55

下载: 导出CSV

表 2 双重属性空间和隐空间的重要性分析

Table 2. Analysis of importance of dual attribute space and latent space %

对比策略	准确率类型	Top-1	Top-5	Top-10
单元属性提取	R_latent	54.34	75.42	82.94
	R_attr	39.26	65.50	75.29
	R_total	56.42	77.63	85.15
多元属性提取	R_latent	54.35	75.88	83.14
	R_attr	52.65	77.86	85.98
	R_total	64.88	85.02	90.55

下载: 导出CSV

表 3 文本属性和图像属性的重要性分析

Table 3. Analysis of importance of text attribute and image attribute %

对比策略	准确率类型	Top-1	Top-5	Top-10
无图片属性	R_latent	53.20	74.77	82.47
	R_attr	40.07	66.81	76.69
	R_total	54.81	77.49	84.67
无文本属性	R_latent	53.41	75.00	82.88
	R_attr	37.35	62.13	72.38
	R_total	51.45	73.15	81.56

下载: 导出CSV

表 4 模态差异消除作用

Table 4. Role of modal difference elimination %

对比策略	准确率类型	Top-1	Top-5	Top-10
无coral损失	R_total	44.25	60.62	67.43
有coral损失	R_total	56.42	77.63	85.15

下载: 导出CSV

参考文献(33)

[1]	ZHENG L, YANG Y, HAUPTMANN A G. Person re-identification: Past, present and future[EB/OL]. (2016-10-11)[2020-10-30]. http://arxiv.org/abs/1610.02984.
[2]	罗浩, 姜伟, 范星, 等. 基于深度学习的行人重识别研究进展[J]. 自动化学报, 2019, 45(11): 2032-2049. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201911002.htm LUO H, JIANG W, FAN X, et al. A survey on deep learning based person re-identification[J]. Acta Automatica Sinica, 2019, 45(11): 2032-2049(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201911002.htm
[3]	YE M, SHEN J, LIN G, et al. Deep learning for person re-identification: A survey and outlook. (2020-01-13)[2020-10-30]. https://arxiv.org/abs/2001.04193v1.
[4]	LI S, XIAO T, LI H, et al. Person search with natural language description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1970-1979.
[5]	JI G, LI S J, PANG Y. Fusion-attention network for person search with free-form natural language[J]. Pattern Recognition Letters, 2018, 116: 205-211. doi: 10.1016/j.patrec.2018.10.020
[6]	CHEN T, XU C, LUO J. Improving text-based person search by spatial matching and adaptive threshold[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2018: 1879-1887.
[7]	CHEN D, LI H, LIU X, et al. Improving deep visual representation for person re-identification by global and local image-language association[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 56-73.
[8]	WANG Y, BO C, WANG D, et al. Language person search with mutually connected classification loss[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2019: 2057-2061.
[9]	ZHANG Y, LU H. Deep cross-modal projection learning for image-text matching[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 707-723.
[10]	AGGARWAL S, BABU R V, CHAKRABORTY A. Text-based person search via attribute-aided matching[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2020: 2617-2625.
[11]	LI D, CHEN X, HUANG K Q. Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios[C]//Proceedings of the Asian Conference on Pattern Recognition. Piscataway: IEEE Press, 2015: 111-115.
[12]	DENG Y, LUO P, LOY C C, et al. Pedestrian attribute recognition at far distance[C]// Proceedings of the ACM International Conference on Multimedia. New York: ACM, 2014: 789-792.
[13]	WANG J, ZHU X, GONG S, et al. Attribute recognition by joint recurrent learning of context and correlation[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 17467750.
[14]	LIN Y, ZHENG L, ZHENG Z, et al. Improving person re-identification by attribute and identity learning[J]. Pattern Recognition, 2019, 95: 151-161. doi: 10.1016/j.patcog.2019.06.006
[15]	MATSUKAWA T, SUZUKI E. Person re-identification using CNN features learned from combination of attributes[C]//Proceedings of the IEEE International Conference on Pattern Recognition. Piscataway: IEEE Press, 2016: 2428-2433.
[16]	CHEN W, CHEN X, ZHANG J, et al. Beyond triplet loss: A deep quadruplet network for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 17355323.
[17]	ALEXANDER H, LUCAS B, BASTIAN L. In defense of the triplet loss for person reidentification[EB/OL]. (2017-03-22)[2020-10-30]. https://arxiv.org/abs/1703.07737.
[18]	YIN J, WU A, ZHENG W. Fine-grained person re-identification[J]. International Journal of Computer Vision, 2020, 128: 1654-1672. doi: 10.1007/s11263-019-01259-0
[19]	GAO Z, GAO L S, ZHANG H, et al. Deep spatial pyramid feature collaborative reconstruction for partial person reid[C]//Proceedings of the ACM International Conference on Multimedia. New York: ACM, 2019: 1879-1887.
[20]	ZHENG Z, YANG X, YU Z, et al. Joint discriminative and generative learning for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1335-1344.
[21]	YANG Q, WU A, ZHENG W, et al. Person re-identification by contour sketch under moderate clothing change[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(6): 2029-2046. doi: 10.1109/TPAMI.2019.2960509
[22]	WANG B, YANG Y, XU X, et al. Adversarial cross-modal retrieval[C]//Proceedings of the ACM International Conference on Multimedia. New York: ACM, 2017: 154-162.
[23]	JING Y, SI C, WANG J, et al. Pose-guided joint global and attentive local matching network for text-based person search[EB/OL]. (2018-09-22)[2020-10-30]. https://arxiv.org/abs/1809.08440v2.
[24]	LOPER E, KLEIN E, BIRD S. Natural language processing with python-natural language toolkit[CP/OL]. (2019-09-04)[2020-10-30]. http://www.nltk.org/book/.
[25]	LIU X, ZHAO H, TIAN M, et al. HydraPlus-Net: Attentive deep features for pedestrian analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 350-359.
[26]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
[27]	LUO Y, ZHENG Z, ZHENG L, et al. Macro-micro adversarial network for human parsing[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 424-440.
[28]	SUN B, SAENKO K. Deep coral: Correlation alignment for deep domain adaptation[C]//Proceedings of the European Conference on Computer Vision. Berlin, German: Springer, 2016: 443-450.
[29]	SHI B, JI L, LU P, et al. Knowledge aware semantic concept expansion for image-text matching[C]//Proceedings of the International Joint Conference on Artificial Intelligence. San Francisco: Margan Kaufmann, 2019: 5182-5189.
[30]	HUANG Y, WU Q, SONG C, et al. Learning semantic concepts and order for image and sentence matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6163-6171.
[31]	KINGMA D P, BA J. Adam: A method for stochastic optimization[EB/OL]. (2017-01-30)[2020-10-30]. https://arxiv.org/abs/1412.6980.
[32]	LI S, XIAO T, LI H, et al. Identity-aware textual-visual matching with latent co-attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1908-1917.
[33]	ZHENG Z, ZHENG L, GARRETT M, et al. Dual-path convolutional image-text embeddings with instance loss[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2020, 16(2): 1-23.