A person re-identification method for fusing convolutional attention and Transformer architecture

WANG Jing; LI Peitong; ZHAO Rongfeng; ZHANG Yun; MA Zhenling

doi:10.13700/j.bh.1001-5965.2022.0456

Volume 50 Issue 2

Feb. 2024

Turn off MathJax

Article Contents

Journal of Beijing University of Aeronautics and Astronautics > 2024 > 50(2): 466-476.

WANG J，LI P T，ZHAO R F，et al. A person re-identification method for fusing convolutional attention and Transformer architecture[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（2）：466-476 （in Chinese） doi: 10.13700/j.bh.1001-5965.2022.0456

Citation:

PDF( 1528 KB)

A person re-identification method for fusing convolutional attention and Transformer architecture

doi: 10.13700/j.bh.1001-5965.2022.0456

College of Information Technology, Shanghai Ocean University, Shanghai 201306, China

Funds: National Natural Science Foundation of China (61806123,42101443); National Key R&D Program of China (2019YFD0900805)

More Information

Corresponding author: E-mail：y-zhang@shou.edu.cn
Received Date: 02 Jun 2022
Accepted Date: 25 Jul 2022

Available Online: 13 Aug 2022

Publish Date: 11 Aug 2022

Abstract

Abstract

Person Re-identification technology is one of the important methods in intelligent security systems. In order to build a person re-identification model suitable for various complex scenarios, this article proposed a method of Fusing Convolutional Attention and Transformer architecture (FCAT) based on existing convolutional neural networks and Transformer models to enhance the Transformer’s attention to local detail information. This method mainly improves the transformer's ability to extract local detail features indirectly by embedding convolutional space attention and channel attention respectively to enhance the attention to important regions and important channel features in the image. Comparative ablation experiments on three publicly available pedestrian re-identification datasets demonstrate that the proposed method achieves comparable results on non-occluded datasets and significantly improves performance on occluded datasets. Additionally, the proposed model is more lightweight, leading to improved inference speed without increasing additional computational load or model parameters.
- person re-identification,
- deep learning,
- convolutional neural network,
- transformer,
- attention mechanisms

FullText(HTML)

References(45)

References

[1]	YE M, SHEN J B, LIN G J, et al. Deep learning for person re-identification: A survey and outlook[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 2872-2893. doi: 10.1109/TPAMI.2021.3054775
[2]	ZHENG L, YANG Y, HAUPTMANN A G. Person re-identification: Past, present and future[EB/OL]. (2016-10-10) [2022-05-20]. https://arxiv.org/abs/1610.02984.pdf.
[3]	LUO H, GU Y Z, LIAO X Y, et al. Bag of tricks and a strong baseline for deep person re-identification[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE Press, 2020: 1487-1495.
[4]	HE S T, LUO H, WANG P C, et al. TransReID: Transformer-based object re-identification[C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2022: 14993-15002.
[5]	PARK H, HAM B. Relation network for person re-identification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 11839-11847. doi: 10.1609/aaai.v34i07.6857
[6]	LI W, ZHU X T, GONG S G. Harmonious attention network for person re-identification[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2285-2294.
[7]	CAI H L, WANG Z G, CHENG J X. Multi-scale body-part mask guided attention for person re-identification[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE Press, 2020: 1555-1564.
[8]	张晓伟, 吕明强, 李慧. 基于局部语义特征不变性的跨域行人重识别[J]. 北京航空航天大学学报, 2020, 46(9): 1682-1690. doi: 10.13700/j.bh.1001-5965.2020.0072 ZHANG X W, LYU M Q, LI H. Cross-domain person re-identification based on partial semantic feature invariance[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(9): 1682-1690 (in Chinese). doi: 10.13700/j.bh.1001-5965.2020.0072
[9]	ZHANG Z Z, LAN C L, ZENG W J, et al. Relation-aware global attention for person re-identification[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 3183-3192.
[10]	CHEN G Y, GU T P, LU J W, et al. Person re-identification via attention pyramid[J]. IEEE Transactions on Image Processing, 2021, 30: 7663-7676. doi: 10.1109/TIP.2021.3107211
[11]	孙义博, 张文靖, 王蓉, 等. 基于通道注意力机制的行人重识别方法[J]. 北京航空航天大学学报, 2022, 48(5): 881-889. doi: 10.13700/j.bh.1001-5965.2020.0684 SUN Y B, ZHANG W J, WANG R, et al. Pedestrian re-identification method based on channel attention mechanism[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(5): 881-889 (in Chinese). doi: 10.13700/j.bh.1001-5965.2020.0684
[12]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[13]	GALASSI A, LIPPI M, TORRONI P. Attention in natural language processing[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(10): 4291-4308.
[14]	LIU Y, ZHANG Y, WANG Y X, et al. A survey of visual transformers[J]. IEEE Transaction on Neural Networks and Learning Systems, 2023.
[15]	KHAN S, NASEER M, HAYAT M, et al. Transformers in vision: A survey[J]. ACM Computing Surveys, 2022, 54(10): 1-41.
[16]	HADJI I, WILDES R P. What do we understand about convolutional networks? [EB/OL]. (2018-03-23) [2022-05-20]. https://arxiv.org/abs/1803.08834.pdf.
[17]	SONG C F, HUANG Y, OUYANG W L, et al. Mask-guided contrastive attention model for person re-identification[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 1179-1188.
[18]	SUN Y F, ZHENG L, YANG Y, et al. Beyond part models: Person retrieval with refined part pooling (and A strong convolutional baseline)[C]// Computer Vision - ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IV. New York: ACM, 2018: 501-518.
[19]	ZHANG X, LUO H, FAN X, et al. AlignedReID: Surpassing human-level performance in person re-identification[EB/OL]. (2017-11-22) [2022-05-20].https://arxiv.org/abs/1711.08184.pdf.
[20]	SU C, LI J N, ZHANG S L, et al. Pose-driven deep convolutional model for person re-identification[C]// 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 3980-3989.
[21]	WANG G A, YANG S, LIU H Y, et al. High-order information matters: learning relation and topology for occluded person re-identification[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6448-6457.
[22]	XU J, ZHAO R, ZHU F, et al. Attention-aware compositional network for person re-identification[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2119-2128.
[23]	GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: a survey[J]. Computational Visual Media, 2022, 8(3): 331-368. doi: 10.1007/s41095-022-0271-y
[24]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Cham: Springer, 2020: 213-229.
[25]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. (2020-10-22) [2022-05-20]. https://arxiv.org/abs/2010.11929.pdf.
[26]	LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2022: 9992-10002.
[27]	WU B C, XU C F, DAI X L, et al. Visual transformers: Token-based image representation and processing for computer vision[EB/OL]. (2020-06-05) [2022-05-21]. https://arxiv.org/abs/2006.03677.pdf.
[28]	SRINIVAS A, LIN T Y, PARMAR N, et al. Bottleneck transformers for visual recognition[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 16514-16524.
[29]	TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[EB/OL]. (2020-12-23) [2022-05-21].https://arxiv.org/abs/2012.12877.pdf.
[30]	D’ASCOLI S, TOUVRON H, LEAVITT M L, et al. ConViT: Improving vision transformers with soft convolutional inductive biases[J]. Journal of Statistical Mechanics:Theory and Experiment, 2022, 2022(11): 114005. doi: 10.1088/1742-5468/ac9830
[31]	ZHANG Q L, YANG Y B. ResT: an efficient transformer for visual recognition[EB/OL]. (2021-05-28) [2022-05-21]. https://arxiv.org/abs/2105.13677.pdf.
[32]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[33]	PARK J, WOO S, LEE J Y, et al. BAM: Bottleneck attention module[EB/OL]. (2018-07-17) [2022-05-21]. https://arxiv.org/abs/1807.06514.pdf.
[34]	LI X, WANG W H, HU X L, et al. Selective kernel networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 510-519.
[35]	YAO H T, ZHANG S L, HONG R C, et al. Deep representation learning with part loss for person re-identification[J]. IEEE Transactions on Image Processing, 2019, 28(6): 2860-2871. doi: 10.1109/TIP.2019.2891888
[36]	ZHANG Z Z, LAN C L, ZENG W J, et al. Densely semantically aligned person re-identification[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 667-676.
[37]	YE M, LAN X Y, WANG Z, et al. Bi-directional center-constrained top-ranking for visible thermal person re-identification[J]. IEEE Transactions on Information Forensics and Security, 2020, 15: 407-419. doi: 10.1109/TIFS.2019.2921454
[38]	ZHENG L, SHEN L Y, TIAN L, et al. Scalable person re-identification: A benchmark[C]// 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 1116-1124.
[39]	ZHENG Z D, ZHENG L, YANG Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro[C]// 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 3774-3782.
[40]	MIAO J X, WU Y, LIU P, et al. Pose-guided feature alignment for occluded person re-identification[C]// 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 542-551.
[41]	WANG X G, DORETTO G, SEBASTIAN T, et al. Shape and appearance context modeling[C]// 2007 IEEE 11th International Conference on Computer Vision. Piscataway: IEEE Press, 2007: 1-8.
[42]	ZHAO H Y, TIAN M Q, SUN S Y, et al. Spindle net: Person re-identification with human body region guided feature decomposition and fusion[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 907-915.
[43]	WANG G S, YUAN Y F, CHEN X, et al. Learning discriminative features with multiple granularities for person re-identification[C]// Proceedings of the 26th ACM international conference on Multimedia. New York: ACM, 2018: 274-282.
[44]	ZHAO L M, LI X, ZHUANG Y T, et al. Deeply-learned part-aligned representations for person re-identification[C]// 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 3239-3248.
[45]	JIA M X, CHENG X H, LU S J, et al. Learning disentangled representation implicitly via transformer for occluded person re-identification[J]. IEEE Transactions on Multimedia, 2023, 25: 1294-1305. doi: 10.1109/TMM.2022.3141267

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(3) / Tables(4)

Get Citation

PDF

XML

Article Metrics

Article views(972) PDF downloads(18)

A person re-identification method for fusing convolutional attention and Transformer architecture

doi: 10.13700/j.bh.1001-5965.2022.0456

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

A person re-identification method for fusing convolutional attention and Transformer architecture

doi: 10.13700/j.bh.1001-5965.2022.0456

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content