基于跨模态近邻损失的可视-红外行人重识别

赵三元; 阿琪; 高宇

doi:10.13700/j.bh.1001-5965.2022.0422

基于跨模态近邻损失的可视-红外行人重识别

doi: 10.13700/j.bh.1001-5965.2022.0422

赵三元^{1, 2, ,},
阿琪¹,
高宇¹

1.
北京理工大学计算机学院，北京 100081
2.
北京理工大学长三角研究院，嘉兴 314019

基金项目: 国家自然科学基金(61902027)

详细信息

通讯作者:
E-mail：zhaosanyuan@bit.edu.cn

中图分类号: TP391.4
计量
- 文章访问数: 484
- HTML全文浏览量: 67
- PDF下载量: 12
- 被引次数: 0
出版历程
- 收稿日期: 2022-05-26
- 录用日期: 2022-07-02
- 网络出版日期: 2022-11-21
- 整期出版日期: 2024-02-27

Cross-modality nearest neighbor loss for visible-infrared person re-identification

ZHAO Sanyuan^{1, 2
, ,},
A Qi¹,
GAO Yu¹

1.
School of Computer Science & Technology，Beijing Institute of Technology，Beijing 100081，China
2.
Yangtze Delta Region Academy，Beijing Institute of Technology，Jiaxing 314019，China

Funds: National Natural Science Foundation of China (61902027)

More Information

Corresponding author: E-mail：zhaosanyuan@bit.edu.cn

摘要

摘要:
可视-红外跨模态行人重识别任务的目标是给定一个模态的特定人员图像，在其他不同模态摄像机所拍摄的图像集中进行检索，找出相同人员对应的图像。由于成像方式不同，不同模态的图像之间存在明显的模态差异。为此，从度量学习的角度出发，对损失函数进行改进以获取具有更加辨别性的信息。对图像特征内聚性进行理论分析，并在此基础上提出一种基于内聚性分析和跨模态近邻损失函数的重识别方法，以加强不同模态样本的内聚性。将跨模态困难样本的相似性度量问题转化为跨模态最近邻样本对和同模态样本对的相似性度量，使得网络对模态内聚性的优化更加高效和稳定。对所提方法在全局特征表示的基线网络和部分特征表示的基线网络上进行实验验证结果表明：所提方法对可视-红外行人重识别的预测结果相较于基线方法，平均准确度最高可提升8.44%，证明了方法在不同网络架构中的通用性；同时，以较小的模型复杂度和较低的计算量为代价，实现了可靠的跨模态行人重识别结果。
- 可视-红外行人重识别 /
- 度量学习 /
- 深度学习 /
- 跨模态学习 /
- 计算机视觉
Abstract:
The goal of the visual-infrared person re-identification task is to search the image of a specific person in a given modality in the image set taken by other cameras in different modality to find out the corresponding image of the same person. Due to the different imaging methods, there are obvious modal differences between images of different modalities. Therefore, from the perspective of metric learning, the loss function is improved to obtain more discriminative information. The cohesiveness of image features is analyzed theoretically, and a re-recognition method based on cohesiveness analysis and cross-modal nearest neighbor loss function is proposed to strengthen the cohesiveness of different modal samples. The similarity measurement problem of cross-modal hard samples is transformed into the similarity measurement of cross-modal nearest neighbor sample pairs and the same modality sample pairs, which makes the optimization of modal cohesion of the network more efficient and stable. The proposed method is experimentally verified on the baseline networks of global feature representation and partial feature representation. Compared with the baseline method, the proposed method can improve the average accuracy of the visual and infrared person re-identification by up to 8.44%. The universality of the proposed method in different network architectures is proved. Moreover, at the cost of less model complexity and less computation, the reliable visual-infrared person re-identification results are achieved.
- visible-infrared person re-identification /
- metric learning /
- deep learning /
- cross-modality learning /
- computer vision

HTML全文

图 1 不同方法的内聚性度量示意图

Figure 1. Illustration of cohesion metrics for different methods

下载: 全尺寸图片幻灯片

图 2 基线网络结构示意图

Figure 2. Schematic diagram of baseline network architecture

下载: 全尺寸图片幻灯片

图 3 红外-可视和可视-红外模式下检索Rank-10结果示意图

Figure 3. Retrieval Rank-10 results for infrared-visible and visible-infrared modes

下载: 全尺寸图片幻灯片

表 1 RegDB数据集中可视-红外模式下全局特征表示的消融实验结果

Table 1. Ablation experimental results ofglobal feature representation under visible-infrared mode on RegDB dataset %

方法	Rank-1	Rank-10	Rank-20	mAP	mINP
基线方法	82.76	91.91	94.57	80.64	73.44
CenL 1#	83.40	92.47	95.53	81.35	73.95
CenL 2#	83.28	92.58	95.16	81.32	74.17
本文方法	85.11	93.50	96.06	84.18	76.70

下载: 导出CSV

表 2 RegDB数据集中红外-可视模式下全局特征表示的消融实验结果

Table 2. Ablation experimental results of global feature representation under infrared-visible mode on RegDB dataset %

方法	Rank-1	Rank-10	Rank-20	mAP	mINP
基线方法	83.67	93.00	95.59	80.65	71.41
CenL 1#	82.51	92.75	95.67	80.78	72.62
CenL 2#	82.30	92.02	94.73	80.65	72.57
本文方法	85.08	93.00	95.87	82.80	75.40

下载: 导出CSV

表 3 RegDB数据集中可视-红外模式下部分特征表示的消融实验结果

Table 3. Ablation experimental results of part feature representation under visible-infrared mode on RegDB dataset %

方法	Rank-1	Rank-10	Rank-20	mAP	mINP
基线方法	91.05	97.16	98.57	83.28	68.84
本文方法	93.94	97.87	98.96	91.72	85.99

下载: 导出CSV

表 4 RegDB数据集中红外-可视模式下部分特征表示的消融实验结果

Table 4. Ablation experimental results of part feature representation under infrared-visible mode on RegDB dataset %

方法	Rank-1	Rank-10	Rank-20	mAP	mINP
基线方法	89.30	96.41	98.16	81.46	64.81
本文方法	94.43	97.80	98.55	91.85	85.76

下载: 导出CSV

表 5 SYSU-MM01数据集中部分特征表示的消融实验结果

Table 5. Ablation experimental results under part feature representation on SYSU-MM01 dataset %

方法	Rank-1	Rank-10	Rank-20	mAP	mINP
基线方法	58.18	90.49	95.34	55.25	39.54
本文方法	59.97	92.96	96.96	57.71	43.60

下载: 导出CSV

表 6 RegDB数据集中可视-红外和红外-可视模式下本文方法与先进方法的比较

Table 6. Comparison with proposed method and advanced mothods on RegDB datasets under visible-infrared and infrared-visual modes %

方法	Rank-1		mAP		mINP
方法	可视-红外	红外-可视	可视-红外	红外-可视	可视-红外	红外-可视
MPMN^[31]	86.56	84.62	82.91	79.49
NFS^[25]	80.54	77.95	72.10	69.79
MPANet^[26]	82.80	83.70	80.70	80.90
本文方法(全局特征表示)	89.34	86.49	88.26	85.34	79.47	77.19
本文方法(部分特征表示)	93.94	94.43	91.72	91.85	85.99	85.76

下载: 导出CSV

表 7 SYSU-MM01数据集单发-全局设置下较方法与先进方法的比较

Table 7. Comparison with proposed method and advanced methods on SYSU-MM01 datasets under one shot-global setting %

方法	Rank-1	mAP	mINP
AGW^[32]	47.50	47.64	35.30
NFS^[25]	56.91	55.45
MPANet^[26]	70.58	68.24
本文方法(全局特征表示)	53.30	52.50	39.32
本文方法(部分特征表示)	59.97	57.71	43.60

下载: 导出CSV

参考文献(32)

[1]	GONG S G, XIANG T. Person re-identification[M]. GONG S G, XIANG T. Visual analysis of behaviour. Berlin: Springer, 2011: 301-313.
[2]	ZHENG L, YANG Y, HAUPTMANN A G. Person re-identification: Past, present and future[EB/OL]. (2016-10-10)[2020-03-15]. https://arxiv.org/abs/1610.02984.pdf.
[3]	GHEISSARI N, SEBASTIAN T B, HARTLEY R. Person reidentification using spatiotemporal appearance[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2006: 1528-1535.
[4]	KARANAM S, LI Y, RADKE R J. Person re-identification with discriminatively trained viewpoint invariant dictionaries[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 4516-4524.
[5]	BAK S, ZAIDENBERG S, BOULAY B, et al. Improving person re-identification by viewpoint cues[C]//Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway: IEEE Press, 2014: 175-180.
[6]	HUANG Y K, ZHA Z J, FU X Y, et al. Illumination-invariant person re-identification[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York: ACM, 2019: 365-373.
[7]	CHO Y J, YOON K J. Improving person re-identification via pose-aware multi-shot matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1354-1362.
[8]	ZHAO H Y, TIAN M Q, SUN S Y, et al. Spindle Net: Person re-identification with human body region guided feature decomposition and fusion[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 907-915.
[9]	WANG Z X, WANG Z, ZHENG Y Q, et al. Learning to reduce dual-level discrepancy for infrared-visible person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 618-626.
[10]	HAO Y, WANG N N, LI J E, et al. HSME: Hypersphere manifold embedding for visible thermal person re-identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2019, 33(1): 8385-8392. doi: 10.1609/aaai.v33i01.33018385
[11]	YE M, LAN X Y, LENG Q M. Modality-aware collaborative learning for visible thermal person re-identification[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York: ACM, 2019: 347-355.
[12]	HAO Y, WANG N N, GAO X B, et al. Dual-alignment feature embedding for cross-modality person re-identification[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York: ACM, 2019: 57-65.
[13]	WANG G A, ZHANG T Z, CHENG J, et al. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 3622-3631.
[14]	JIA M X, ZHAI Y P, LU S J, et al. A similarity inference metric for RGB-infrared cross-modality person re-identification[EB/OL]. (2020-07-03)[2022-03-18].https://arxiv.org/abs/2007.01504.pdf.
[15]	WEI X, LI D G, HONG X P, et al. Co-attentive lifting for infrared-visible person re-identification[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1028-1037.
[16]	WANG G A, ZHANG T Z, YANG Y, et al. Cross-modality paired-images generation for RGB-infrared person re-identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020, 34(7): 12144-12151. doi: 10.1609/aaai.v34i07.6894
[17]	LU Y, WU Y, LIU B, et al. Cross-modality person re-identification with shared-specific feature transfer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 13376-13386.
[18]	YE M, SHEN J B, CRANDALL D, et al. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 229-247.
[19]	PU N, CHEN W, LIU Y, et al. Dual Gaussian-based variational subspace disentanglement for visible-infrared person re-identification[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 2149-2158.
[20]	LI D G, WEI X, HONG X P, et al. Infrared-visible cross-modal person re-identification with an X modality[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020, 34(4): 4610-4617. doi: 10.1609/aaai.v34i04.5891
[21]	PARK H, LEE S, LEE J, et al. Learning by aligning: Visible-infrared person re-identification using cross-modal correspondences[EB/OL]. (2021-08-17)[2022-03-20]. https://arxiv.org/abs/2108.07422.
[22]	MIAO Z, LIU H, SHI W, et al. Modality-aware style adaptation for RGB-infrared person re-identification[C]//Proceedings of the International Joint Conference on Artificial Intelligence. New York: ACM, 2021: 916-922.
[23]	LING Y G, LUO Z M, LIN Y J, et al. A multi-constraint similarity learning with adaptive weighting for visible-thermal person re-identification[C]//Proceedings of the International Joint Conference on Artificial Intelligence. New York: ACM, 2021: 845-851.
[24]	ZHAO Z W, LIU B, CHU Q, et al. Joint color-irrelevant consistency learning and identity-aware modality adaptation for visible-infrared cross modality person re-identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021, 35(4): 3520-3528. doi: 10.1609/aaai.v35i4.16466
[25]	CHEN Y, WAN L, LI Z H, et al. Neural feature search for RGB-infrared person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 587-597.
[26]	WU Q, DAI P Y, CHEN J, et al. Discover cross-modality nuances for visible-infrared person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 4328-4337.
[27]	TIAN X D, ZHANG Z Z, LIN S H, et al. Farewell to mutual information: Variational distillation for cross-modal person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 1522-1531.
[28]	YE M, WANG Z, LAN X, et al. Visible thermal person re-identification via dual-constrained top-ranking[C]//Proceedings of the International Joint Conference on Artificial Intelligence. New York: ACM, 2018: 1092-1099.
[29]	LIU H J, TAN X H, ZHOU X C. Parameter sharing exploration and hetero-center based triplet loss for visible-thermal person re-identification[EB/OL]. (2020-14-04)[2022-03-20]. https://arxiv.org/abs/2008.06223.pdf.
[30]	NGUYEN D T, HONG H G, KIM K W, et al. Person recognition system based on a combination of body images from visible light and thermal cameras[J]. Sensors, 2017, 17(3): 605. doi: 10.3390/s17030605
[31]	WANG P Y, ZHAO Z C, SU F, et al. Deep multi-patch matching network for visible thermal person re-identification[J]. IEEE Transactions on Multimedia, 2021, 23: 1474-1488. doi: 10.1109/TMM.2020.2999180
[32]	YE M, SHEN J B, LIN G J, et al. Deep learning for person re-identification: A survey and outlook[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 2872-2893. doi: 10.1109/TPAMI.2021.3054775