A fast long-term visual tracking algorithm based on deep learning

HOU Zhiqiang; MA Jingyuan; HAN Ruoxue; MA Sugang; YU Wangsheng; FAN Jiulun

doi:10.13700/j.bh.1001-5965.2022.0645

Volume 50 Issue 8

Aug. 2024

Turn off MathJax

Article Contents

Journal of Beijing University of Aeronautics and Astronautics > 2024 > 50(8): 2391-2403.

HOU Z Q，MA J Y，HAN R X，et al. A fast long-term visual tracking algorithm based on deep learning[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（8）：2391-2403 （in Chinese） doi: 10.13700/j.bh.1001-5965.2022.0645

Citation:

PDF( 1641 KB)

A fast long-term visual tracking algorithm based on deep learning

doi: 10.13700/j.bh.1001-5965.2022.0645

HOU Zhiqiang^{1, 2
,
,},
MA Jingyuan^{1, 2},
HAN Ruoxue^{1, 2},
MA Sugang^{1, 2},
YU Wangsheng³,
FAN Jiulun¹

1.
School of Computer Science and Technology，Xi’an University of Posts and Telecommunications，Xi’an 710121，China
2.
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing，Xi’an University of Posts and Telecommunications，Xi’an 710121，China
3.
College of Information and Navigation，Air Force Engineering University，Xi’an 710077，China

Funds: National Natural Science Foundation of China (62072370)

More Information

Corresponding author: E-mail：hzq@xupt.edu.cn
Received Date: 27 Jul 2022
Accepted Date: 26 Nov 2022
Publish Date: 10 Jan 2023

Abstract

Abstract

Current deep learning-based visual tracking algorithms have difficulty tracking the target accurately in real-time in complex long-term monitoring environments including target size change, occlusion, and out-of-view. To solve this problem, a fast long-term visual tracking algorithm is proposed, which consists of a fast short-term tracking algorithm and a fast global re-detection module. First, as a short-term tracking algorithm, the attention module of second-order channel and region spatial fusion is added to the base algorithm SiamRPN. Then, in order to make the improved short-term tracking algorithm have a fast long-term tracking ability, the global re-detection module based on template matching proposed in this paper is added to the algorithm, which uses a lightweight network and fast similarity judgment method to speed up the re-detection rate. The proposed algorithm is tested on five datasets (OTB100, LaSOT, UAV20L, VOT2018-LT, and VOT2020-LT). With an average tracking speed of 104 frames per second, the experimental findings demonstrate the algorithm's outstanding long-term tracking performance.
- long-term visual tracking,
- deep learning,
- second-order channel attention,
- regional spatial attention,
- global re-detection

FullText(HTML)

References(55)

References

[1]	李玺, 查宇飞, 张天柱, 等. 深度学习的目标跟踪算法综述[J]. 中国图象图形学报, 2019, 24(12): 2057-2080. doi: 10.11834/jig.190372 LI X, ZHA Y F, ZHANG T Z, et al. Survey of visual object tracking algorithms based on deep learning[J]. Journal of Image and Graphics, 2019, 24(12): 2057-2080(in Chinese). doi: 10.11834/jig.190372
[2]	刘芳, 孙亚楠, 王洪娟, 等. 基于残差学习的自适应无人机目标跟踪算法[J]. 北京航空航天大学学报, 2020, 46(10): 1874-1882. LIU F, SUN Y N, WANG H J, et al. Adaptive UAV target tracking algorithm based on residual learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(10): 1874-1882(in Chinese).
[3]	张诚, 马华东, 傅慧源. 基于时空关联图模型的视频监控目标跟踪[J]. 北京航空航天大学学报, 2015, 41(4): 713-720. ZHANG C, MA H D, FU H Y. Object tracking in surveillance videos using spatial-temporal correlation graph model[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(4): 713-720(in Chinese).
[4]	LUKEŽIČ A, ZAJC L Č, VOJÍŘ T, et al. FuCoLoT–A fully-correlational long-term tracker[C]//Proceedings of the Asian Conference on Computer Vision. Berlin: Springer, 2018: 595-611.
[5]	王鑫, 侯志强, 余旺盛, 等. 基于深度稀疏学习的鲁棒视觉跟踪[J]. 北京航空航天大学学报, 2017, 43(12): 2554-2563. WANG X, HOU Z Q, YU W S, et al. Robust visual tracking based on deep sparse learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2017, 43(12): 2554-2563(in Chinese).
[6]	蒲磊, 冯新喜, 侯志强, 等. 基于级联注意力机制的孪生网络视觉跟踪算法[J]. 北京航空航天大学学报, 2020, 46(12): 2302-2310. PU L, FENG X X, HOU Z Q, et al. Siamese network visual tracking algorithm based on cascaded attention mechanism[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(12): 2302-2310(in Chinese).
[7]	LI B, YAN J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980.
[8]	LI B, WU W, WANG Q, et al. SiamRPN++: Evolution of Siamese visual tracking with very deep networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4277-4286.
[9]	DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: Accurate tracking by overlap maximization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4655-4664.
[10]	CHENG S Y, ZHONG B N, LI G R, et al. Learning to filter: Siamese relation network for robust tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 4419-4429.
[11]	ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 103-119.
[12]	ZHANG Y H, WANG D, WANG L J, et al. Learning regression and verification networks for long-term visual tracking[EB/OL]. (2018-11-19)[2022-07-01]. http://arxiv.org/abs/1809.04320.
[13]	YAN B, ZHAO H J, WANG D, et al. ‘Skimming-Perusal’ tracking: A framework for real-time and robust long-term tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 2385-2393.
[14]	VOIGTLAENDER P, LUITEN J, TORR P H S, et al. SiamR-CNN: Visual tracking by re-detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6577-6587.
[15]	FAN H, LIN L T, YANG F, et al. LaSOT: A high-quality benchmark for large-scale single object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 5369-5378.
[16]	MUELLER M, SMITH N, GHANEM B. A benchmark and simulator for UAV tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 445-461.
[17]	LUKEŽIČ A, ZAJC L Č, VOJÍŘ T, et al. Now you see me: Evaluating performance in long-term visual tracking[EB/OL]. (2018-04-19)[2022-07-01]. http://arxiv.org/abs/1804.07056.
[18]	KRISTAN M, LEONARDIS A, MATAS J, et al. The eighth visual object tracking VOT2020 challenge results[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 547-601.
[19]	WU Y, LIM J, YANG M H. Online object tracking: A benchmark[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2013: 2411-2418.
[20]	KRISTAN M, LEONARDIS A, MATAS J, et al. The sixth visual object tracking VOT2018 challenge results[C]//Proceedings of the European Conference on Computer Vision Workshops. Berlin: Springer, 2018.
[21]	KALAL Z, MIKOLAJCZYK K, MATAS J. Tracking-learning-detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409-1422. doi: 10.1109/TPAMI.2011.239
[22]	ZHU G, PORIKLI F, LI H D. Beyond local search: Tracking objects everywhere with instance-specific proposals[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 943-951.
[23]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[24]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 3-19.
[25]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
[26]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141.
[27]	LI P H, XIE J T, WANG Q L, et al. Is second-order information helpful for large-scale visual recognition?[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2089-2097.
[28]	LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN models for fine-grained visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1449-1457.
[29]	JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[EB/OL]. (2016-02-04) [2022-07-01]. http://arxiv.org/abs/1506.02025.
[30]	WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803.
[31]	CHENG J, WU Y, ABDALMAGEED W, et al. QATM: Quality-aware template matching for deep learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 11553-11562.
[32]	HAN K, WANG Y H, TIAN Q, et al. GhostNet: More features from cheap operations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 1577-1586.
[33]	TAN M X, LE Q V. EfficientNet: Rethinking model scaling for convolutional neural networks[EB/OL]. (2020-09-11)[2022-07-01]. http://arxiv.org/abs/1905.11946.
[34]	DING X H, ZHANG X Y, MA N N, et al. RepVGG: Making VGG-style ConvNets great again[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13728-13737.
[35]	SONG J G. UFO-ViT: High performance linear vision transformer without softmax[EB/OL]. (2020-09-11) [2022-07-01]. http://arxiv.org/abs/2109.14382.
[36]	IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[EB/OL]. (2016-11-04) [2022-07-01]. http://arxiv.org/abs/1602.07360.
[37]	RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. doi: 10.1007/s11263-015-0816-y
[38]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2014: 740-755.
[39]	REAL E, SHLENS J, MAZZOCCHI S, et al. YouTube-BoundingBoxes: A large high-precision human-annotated data set for object detection in video[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 7464-7473.
[40]	BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 850-865.
[41]	LI X, MA C, WU B Y, et al. Target-aware deep tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1369-1378.
[42]	DANELLJAN M, HÄGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 4310-4318.
[43]	GALOOGAHI H K, FAGG A, LUCEY S. Learning background-aware correlation filters for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 1144-1152.
[44]	BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: Complementary learners for real-time tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1401-1409.
[45]	LUKEŽIČ A, MATAS J, KRISTAN M. D3S–A discriminative single shot segmentation tracker[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 7131-7140.
[46]	DONG X P, SHEN J B, SHAO L, et al. CLNet: A compact latent network for fast adjusting Siamese trackers[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 378-395.
[47]	YANG T Y, XU P F, HU R B, et al. ROAM: Recurrently optimizing tracking model[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6717-6726.
[48]	DAI K N, WANG D, LU H C, et al. Visual tracking via adaptive spatially-regularized correlation filters[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4665-4674.
[49]	CAO Z A, FU C H, YE J J, et al. HiFT: Hierarchical feature transformer for aerial tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 15457-15466.
[50]	FU Z H, LIU Q J, FU Z H, et al. STMTrack: Template-free visual tracking with space-time memory networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13769-13778.
[51]	BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning discriminative model prediction for tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 6181-6190.
[52]	WANG Q, ZHANG L, BERTINETTO L, et al. Fast online object tracking and segmentation: A unifying approach[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1328-1338.
[53]	LI Y H, ZHANG X F, CHEN D M. SiamVGG: Visual tracking using deeper Siamese networks[EB/OL]. (2022-06-04)[2022-07-01]. http://arxiv.org/abs/1902.02804.
[54]	CHOI S, LEE J, LEE Y, et al. Robust long-term object tracking via improved discriminative model prediction[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 602-617.
[55]	DAI K N, ZHANG Y H, WANG D, et al. High-performance long-term tracking with meta-updater[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6297-6306.

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(10) / Tables(10)

Get Citation

PDF

XML

Article Metrics

Article views(124) PDF downloads(43)

A fast long-term visual tracking algorithm based on deep learning

doi: 10.13700/j.bh.1001-5965.2022.0645

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

A fast long-term visual tracking algorithm based on deep learning

doi: 10.13700/j.bh.1001-5965.2022.0645

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content