留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度学习的快速长时视觉跟踪算法

侯志强 马靖媛 韩若雪 马素刚 余旺盛 范九伦

侯志强,马靖媛,韩若雪,等. 基于深度学习的快速长时视觉跟踪算法[J]. 北京航空航天大学学报,2024,50(8):2391-2403 doi: 10.13700/j.bh.1001-5965.2022.0645
引用本文: 侯志强,马靖媛,韩若雪,等. 基于深度学习的快速长时视觉跟踪算法[J]. 北京航空航天大学学报,2024,50(8):2391-2403 doi: 10.13700/j.bh.1001-5965.2022.0645
HOU Z Q,MA J Y,HAN R X,et al. A fast long-term visual tracking algorithm based on deep learning[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(8):2391-2403 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0645
Citation: HOU Z Q,MA J Y,HAN R X,et al. A fast long-term visual tracking algorithm based on deep learning[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(8):2391-2403 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0645

基于深度学习的快速长时视觉跟踪算法

doi: 10.13700/j.bh.1001-5965.2022.0645
基金项目: 国家自然科学基金(62072370)
详细信息
    通讯作者:

    E-mail:hzq@xupt.edu.cn

  • 中图分类号: TP391.4

A fast long-term visual tracking algorithm based on deep learning

Funds: National Natural Science Foundation of China (62072370)
More Information
  • 摘要:

    在目标尺寸变化、遮挡和出视场等复杂长时视觉跟踪环境下,现有基于深度学习的视觉跟踪算法很难对目标进行实时准确的跟踪。针对该问题,提出一种快速的长时视觉跟踪算法,该算法由一个快速短时视觉跟踪算法和一个快速全局重检测模块组成。在基准算法SiamRPN中加入二阶通道与区域空间融合的注意力模块作为短时视觉跟踪算法,在保证快速性的同时,提高算法的短时视觉跟踪精确度和成功率;为使改进后的短时视觉跟踪算法具有快速的长时视觉跟踪能力,在算法中加入提出的基于模板匹配的全局重检测模块,该模块使用轻量级网络和快速的相似度判断方法,加快重检测速率。在OTB100、LaSOT、UAV20L、VOT2018-LT、VOT2020-LT 等5个数据集上进行测试,实验结果表明,所提算法在长时视觉跟踪中具有优越的跟踪性能,平均速度达104帧/s。

     

  • 图 1  长时视觉跟踪算法流程

    Figure 1.  Flowchart of long-term target tracking algorithm

    图 2  融合注意力模块

    Figure 2.  Fused attention module

    图 3  二阶通道注意力模块

    Figure 3.  Second-order channel attention module

    图 4  区域空间注意力模块

    Figure 4.  Regional spatial attention module

    图 5  重检测模块

    Figure 5.  Re-detection module

    图 6  2种采样方式对候选框提取的影响

    Figure 6.  Effect of two sampling methods on candidate box extraction

    图 7  VOT2020-LT数据集中选定的视频序列跟踪效果

    Figure 7.  Tracking effects of selected video sequences in VOT2020-LT dataset

    图 8  OTB100跟踪结果评估曲线

    Figure 8.  OTB100 tracking result evaluation curves

    图 9  LaSOT跟踪结果评估曲线

    Figure 9.  LaSOT tracking result evaluation curves

    图 10  UAV20L跟踪结果评估曲线及属性的平均成功率对比结果

    Figure 10.  UAV20L tracking result evaluation curves and attribute average success rate comparison results

    表  1  OTB100数据集上残差连接对跟踪性能的影响

    Table  1.   Effect of residual joining on tracking performance on OTB100 dataset

    算法 精确度 成功率
    SiamRPN 0.851 0.637
    SiamRPN+融合注意力模块−残差 0.876 0.653
    SiamRPN+融合注意力模块 0.888 0.665
    下载: 导出CSV

    表  2  OTB100数据集上特征划分区域对跟踪性能的影响

    Table  2.   Effect of feature division on tracking performance on OTB100 dataset

    算法精确度成功率
    SiamRPN0.8510.637
    SiamRPN+传统空间注意力0.8570.641
    SiamRPN+区域空间注意力0.8660.649
    下载: 导出CSV

    表  3  VOT2020-LT数据集上不同阈值的判定精确度

    Table  3.   Determination accuracy of different thresholds on VOT2020-LT dataset

    阈值 精确度
    0.91 0.660
    0.93 0.665
    0.95 0.673
    0.97 0.676
    0.99 0.687
    下载: 导出CSV

    表  4  VOT2020-LT数据集上不同网络对不同属性图像的平均处理时长

    Table  4.   Average processing time of pictures with different attributes by different networks on VOT2020-LT dataset s

    网络 目标较为清晰 背景复杂 小目标 目标模糊
    ResNet50 2.88 6.23 5.91 8.25
    VGG16 4.65 8.14 8.02 12.13
    GhostNet 1.89 2.51 2.15 2.52
    EfficientNet 1.78 2.913 3.16
    RepVGG-A1 1.80 2.86 2.33 3.62
    下载: 导出CSV

    表  5  不同网络下重检测模块的复杂度和参数量

    Table  5.   Complexity and parameter quantity of re-detection module under different networks

    网络 复杂度/109 FLOPs 参数量
    Re(ResNet50) 4.1 30.53×106
    Re(VGG16) 20.5 143.35×106
    Re(GhostNet) 0.426 12.3×106
    Re(EfficientNet) 2.1 16×106
    Re(RepVGG-A1) 2.6 17.78×106
     注: FLOPs指浮点运算数。
    下载: 导出CSV

    表  6  在SiamRPN中加入不同网络的重检测模块在LaSOT数据集中的跟踪结果

    Table  6.   Tracking results of adding re-detection module using different networks in SiamRPN on LaSOT dataset

    网络 精确度 归一化
    精确度
    成功率 检测速度/
    (帧·s−1
    SiamRPN 0.395 0.467 0.408 142
    SiamRPN + Re(ResNet50) 0.496 0.579 0.475 95
    SiamRPN + Re(VGG16) 0.482 0.561 0.463 77
    SiamRPN +Re(GhostNet) 0.417 0.501 0.411 123
    SiamRPN +Re(EfficientNet) 0.451 0.535 0.443 115
    SiamRPN+Re(RepVGG-A1) 0.480 0.559 0.466 111
    下载: 导出CSV

    表  7  消融实验结果

    Table  7.   Ablation experiments results

    SiamRPN 融合注意力 全局重检测 精确度 归一化精确度 成功率 检测速度/(帧·s−1
    0.395 0.467 0.408 142
    0.424 0.507 0.439 134
    0.480 0.559 0.466 111
    0.505 0.589 0.516 106
    下载: 导出CSV

    表  8  VOT2018-LT数据集上的F分值比较

    Table  8.   Comparison of F-score comparison on VOT2018-LT dataset

    算法 F分值
    SLT 0.456
    SiamVGG 0.459
    SiamRPN 0.499
    SYT 0.509
    ATOM 0.510
    MMLT 0.546
    STMTrack 0.571
    DiMP50 0.598
    MBMD 0.610
    SPLT 0.616
    本文 0.625
    下载: 导出CSV

    表  9  VOT2020-LT数据集上的F分值比较

    Table  9.   Comparison of F-score comparison on VOT2020-LT dataset

    算法 F分值
    FuCoLoT 0.409
    D3S 0.438
    SiamRN 0.444
    SiamRPN 0.462
    ASINT 0.503
    STMTrack 0.550
    mbdet 0.563
    DiMP50 0.567
    本文 0.579
    下载: 导出CSV

    表  10  长时视觉跟踪算法平均速度对比

    Table  10.   Average speed comparison of long-term visual tracking algorithms

    算法 平均速度/(帧·s−1)
    mbdet 2
    MBMD 2.7
    SiamR-CNN 4.7
    MMLT 6.1
    FuColoT 6.8
    LTMU 13
    RLT-DiMP 14.2
    SLT 14.9
    SYT 17.8
    ASINT 19
    SPLT 25.7
    本文 104
    下载: 导出CSV
  • [1] 李玺, 查宇飞, 张天柱, 等. 深度学习的目标跟踪算法综述[J]. 中国图象图形学报, 2019, 24(12): 2057-2080. doi: 10.11834/jig.190372

    LI X, ZHA Y F, ZHANG T Z, et al. Survey of visual object tracking algorithms based on deep learning[J]. Journal of Image and Graphics, 2019, 24(12): 2057-2080(in Chinese). doi: 10.11834/jig.190372
    [2] 刘芳, 孙亚楠, 王洪娟, 等. 基于残差学习的自适应无人机目标跟踪算法[J]. 北京航空航天大学学报, 2020, 46(10): 1874-1882.

    LIU F, SUN Y N, WANG H J, et al. Adaptive UAV target tracking algorithm based on residual learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(10): 1874-1882(in Chinese).
    [3] 张诚, 马华东, 傅慧源. 基于时空关联图模型的视频监控目标跟踪[J]. 北京航空航天大学学报, 2015, 41(4): 713-720.

    ZHANG C, MA H D, FU H Y. Object tracking in surveillance videos using spatial-temporal correlation graph model[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(4): 713-720(in Chinese).
    [4] LUKEŽIČ A, ZAJC L Č, VOJÍŘ T, et al. FuCoLoT–A fully-correlational long-term tracker[C]//Proceedings of the Asian Conference on Computer Vision. Berlin: Springer, 2018: 595-611.
    [5] 王鑫, 侯志强, 余旺盛, 等. 基于深度稀疏学习的鲁棒视觉跟踪[J]. 北京航空航天大学学报, 2017, 43(12): 2554-2563.

    WANG X, HOU Z Q, YU W S, et al. Robust visual tracking based on deep sparse learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2017, 43(12): 2554-2563(in Chinese).
    [6] 蒲磊, 冯新喜, 侯志强, 等. 基于级联注意力机制的孪生网络视觉跟踪算法[J]. 北京航空航天大学学报, 2020, 46(12): 2302-2310.

    PU L, FENG X X, HOU Z Q, et al. Siamese network visual tracking algorithm based on cascaded attention mechanism[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(12): 2302-2310(in Chinese).
    [7] LI B, YAN J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980.
    [8] LI B, WU W, WANG Q, et al. SiamRPN++: Evolution of Siamese visual tracking with very deep networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4277-4286.
    [9] DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: Accurate tracking by overlap maximization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4655-4664.
    [10] CHENG S Y, ZHONG B N, LI G R, et al. Learning to filter: Siamese relation network for robust tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 4419-4429.
    [11] ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 103-119.
    [12] ZHANG Y H, WANG D, WANG L J, et al. Learning regression and verification networks for long-term visual tracking[EB/OL]. (2018-11-19)[2022-07-01]. http://arxiv.org/abs/1809.04320.
    [13] YAN B, ZHAO H J, WANG D, et al. ‘Skimming-Perusal’ tracking: A framework for real-time and robust long-term tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 2385-2393.
    [14] VOIGTLAENDER P, LUITEN J, TORR P H S, et al. SiamR-CNN: Visual tracking by re-detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6577-6587.
    [15] FAN H, LIN L T, YANG F, et al. LaSOT: A high-quality benchmark for large-scale single object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 5369-5378.
    [16] MUELLER M, SMITH N, GHANEM B. A benchmark and simulator for UAV tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 445-461.
    [17] LUKEŽIČ A, ZAJC L Č, VOJÍŘ T, et al. Now you see me: Evaluating performance in long-term visual tracking[EB/OL]. (2018-04-19)[2022-07-01]. http://arxiv.org/abs/1804.07056.
    [18] KRISTAN M, LEONARDIS A, MATAS J, et al. The eighth visual object tracking VOT2020 challenge results[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 547-601.
    [19] WU Y, LIM J, YANG M H. Online object tracking: A benchmark[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2013: 2411-2418.
    [20] KRISTAN M, LEONARDIS A, MATAS J, et al. The sixth visual object tracking VOT2018 challenge results[C]//Proceedings of the European Conference on Computer Vision Workshops. Berlin: Springer, 2018.
    [21] KALAL Z, MIKOLAJCZYK K, MATAS J. Tracking-learning-detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409-1422. doi: 10.1109/TPAMI.2011.239
    [22] ZHU G, PORIKLI F, LI H D. Beyond local search: Tracking objects everywhere with instance-specific proposals[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 943-951.
    [23] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
    [24] WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 3-19.
    [25] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
    [26] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141.
    [27] LI P H, XIE J T, WANG Q L, et al. Is second-order information helpful for large-scale visual recognition?[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2089-2097.
    [28] LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN models for fine-grained visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1449-1457.
    [29] JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[EB/OL]. (2016-02-04) [2022-07-01]. http://arxiv.org/abs/1506.02025.
    [30] WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803.
    [31] CHENG J, WU Y, ABDALMAGEED W, et al. QATM: Quality-aware template matching for deep learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 11553-11562.
    [32] HAN K, WANG Y H, TIAN Q, et al. GhostNet: More features from cheap operations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 1577-1586.
    [33] TAN M X, LE Q V. EfficientNet: Rethinking model scaling for convolutional neural networks[EB/OL]. (2020-09-11)[2022-07-01]. http://arxiv.org/abs/1905.11946.
    [34] DING X H, ZHANG X Y, MA N N, et al. RepVGG: Making VGG-style ConvNets great again[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13728-13737.
    [35] SONG J G. UFO-ViT: High performance linear vision transformer without softmax[EB/OL]. (2020-09-11) [2022-07-01]. http://arxiv.org/abs/2109.14382.
    [36] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[EB/OL]. (2016-11-04) [2022-07-01]. http://arxiv.org/abs/1602.07360.
    [37] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. doi: 10.1007/s11263-015-0816-y
    [38] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2014: 740-755.
    [39] REAL E, SHLENS J, MAZZOCCHI S, et al. YouTube-BoundingBoxes: A large high-precision human-annotated data set for object detection in video[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 7464-7473.
    [40] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 850-865.
    [41] LI X, MA C, WU B Y, et al. Target-aware deep tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1369-1378.
    [42] DANELLJAN M, HÄGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 4310-4318.
    [43] GALOOGAHI H K, FAGG A, LUCEY S. Learning background-aware correlation filters for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 1144-1152.
    [44] BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: Complementary learners for real-time tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1401-1409.
    [45] LUKEŽIČ A, MATAS J, KRISTAN M. D3S–A discriminative single shot segmentation tracker[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 7131-7140.
    [46] DONG X P, SHEN J B, SHAO L, et al. CLNet: A compact latent network for fast adjusting Siamese trackers[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 378-395.
    [47] YANG T Y, XU P F, HU R B, et al. ROAM: Recurrently optimizing tracking model[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6717-6726.
    [48] DAI K N, WANG D, LU H C, et al. Visual tracking via adaptive spatially-regularized correlation filters[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4665-4674.
    [49] CAO Z A, FU C H, YE J J, et al. HiFT: Hierarchical feature transformer for aerial tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 15457-15466.
    [50] FU Z H, LIU Q J, FU Z H, et al. STMTrack: Template-free visual tracking with space-time memory networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13769-13778.
    [51] BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning discriminative model prediction for tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 6181-6190.
    [52] WANG Q, ZHANG L, BERTINETTO L, et al. Fast online object tracking and segmentation: A unifying approach[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1328-1338.
    [53] LI Y H, ZHANG X F, CHEN D M. SiamVGG: Visual tracking using deeper Siamese networks[EB/OL]. (2022-06-04)[2022-07-01]. http://arxiv.org/abs/1902.02804.
    [54] CHOI S, LEE J, LEE Y, et al. Robust long-term object tracking via improved discriminative model prediction[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 602-617.
    [55] DAI K N, ZHANG Y H, WANG D, et al. High-performance long-term tracking with meta-updater[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6297-6306.
  • 加载中
图(10) / 表(10)
计量
  • 文章访问数:  32
  • HTML全文浏览量:  5
  • PDF下载量:  6
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-07-27
  • 录用日期:  2022-11-26
  • 网络出版日期:  2023-01-10
  • 整期出版日期:  2024-08-28

目录

    /

    返回文章
    返回
    常见问答