Volume 50 Issue 2
Feb.  2024
Turn off MathJax
Article Contents
MA S G,ZHANG Z X,PU L,et al. Real-time robust visual tracking based on spatial attention mechanism[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(2):419-432 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0329
Citation: MA S G,ZHANG Z X,PU L,et al. Real-time robust visual tracking based on spatial attention mechanism[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(2):419-432 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0329

Real-time robust visual tracking based on spatial attention mechanism

doi: 10.13700/j.bh.1001-5965.2022.0329
Funds:  National Natural Science Foundation of China (62072370); Key Research and Developement program of Shaanxi (2018ZDCXL-GY-04-02); Graduate Innovation Fund of Xi’an University of Posts and Telecommunications (CXJJZL2021011)
More Information
  • Corresponding author: E-mail:msg@xupt.edu.cn
  • Received Date: 07 May 2022
  • Accepted Date: 21 Aug 2022
  • Available Online: 16 Sep 2022
  • Publish Date: 14 Sep 2022
  • A real-time object tracking method coupled with a spatial attention mechanism is suggested in order to enhance the fully convolutional Siamese network (SiamFC) tracker’s tracking capability in complex settings and alleviate the target drift problem in the tracking process. The improved visual geometry group (VGG) network is used as the backbone network to enhance the modeling ability of the tracker for the target deep feature. The self-attention mechanism is optimized, and a lightweight single convolution attention module (SCAM) is proposed. The spatial attention is decomposed into two parallel one-dimensional feature coding processes to reduce the computational complexity of spatial attention. The initial target template in the tracking process is retained as the first template, and the second template is dynamically selected by analyzing the variation of the connected domain in the tracking response map. The target is located after fusing the two templates. The experimental results show that, compared with SiamFC, the success rate of the proposed algorithm on OTB100, LaSOT, and UAV123 datasets is increased respectively by 0.082, 0.045, and 0.045, and the tracking accuracy by 0.118,0.051, and 0.062. On the VOT2018 dataset, the proposed algorithm improves the tracking accuracy, robustness, and expected average overlap by 0.029, 0.276, and 0.134, respectively, compared with SiamFC. Real-time tracking requirements can be satisfied by the tracking speed, which can approach 70 frames per second.

     

  • loading
  • [1]
    GAO M, JIN L S, JIANG Y Y, et al. Manifold Siamese network: A novel visual tracking ConvNet for autonomous vehicles[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(4): 1612-1623. doi: 10.1109/TITS.2019.2930337
    [2]
    寇展, 吴健发, 王宏伦, 等. 基于深度学习的低空小型无人机障碍物视觉感知[J]. 中国科学:信息科学, 2020, 50(5): 692-703. doi: 10.1360/N112019-00034

    KOU Z, WU J F, WANG H L, et al. Obstacle visual sensing based on deep learning for low-altitude small unmanned aerial vehicles[J]. Scientia Sinica (Informationis), 2020, 50(5): 692-703(in Chinese). doi: 10.1360/N112019-00034
    [3]
    MARVASTI-ZADEH S M, CHENG L, GHANEI-YAKHDAN H, et al. Deep learning for visual tracking: A comprehensive survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(5): 3943-3968. doi: 10.1109/TITS.2020.3046478
    [4]
    LUO J H, HAN Y, FAN L Y. Underwater acoustic target tracking: A review[J]. Sensors, 2018, 18(2): 112. doi: 10.3390/s18010112
    [5]
    刘芳, 孙亚楠, 王洪娟, 等. 基于残差学习的自适应无人机目标跟踪算法[J]. 北京航空航天大学学报, 2020, 46(10): 1874-1882. doi: 10.13700/j.bh.1001-5965.2019.0551

    LIU F, SUN Y N, WANG H J, et al. Adaptive UAV target tracking algorithm based on residual learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(10): 1874-1882 (in Chinese). doi: 10.13700/j.bh.1001-5965.2019.0551
    [6]
    韩明, 王景芹, 王敬涛, 等. 基于孪生网络的目标跟踪研究综述[J]. 河北科技大学学报, 2022, 43(1): 27-41.

    HAN M, WANG J Q, WANG J T, et al. Comprehensive survey on target tracking based on Siamese network[J]. Journal of Hebei University of Science and Technology, 2022, 43(1): 27-41(in Chinese).
    [7]
    HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596. doi: 10.1109/TPAMI.2014.2345390
    [8]
    ZHANG K H, ZHANG L, LIU Q S, et al. Fast visual tracking via dense spatio-temporal context learning[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2014: 127-141.
    [9]
    DANELLJAN M, KHAN F S, FELSBERG M, et al. Adaptive color attributes for real-time visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 1090-1097.
    [10]
    MA C, HUANG J B, YANG X K, et al. Hierarchical convolutional features for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 3074-3082.
    [11]
    WANG L J, OUYANG W L, WANG X G, et al. Visual tracking with fully convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 3119-3127.
    [12]
    BHAT G, JOHNANDER J, DANELLJAN M, et al. Unveiling the power of deep tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 493-509.
    [13]
    BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//Eurpean Conference on computer Vision. Berlin: Springer, 2016: 850-865.
    [14]
    HE A F, LUO C, TIAN X M, et al. A twofold Siamese network for real-time object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4834-4843.
    [15]
    PU L, FENG X X, HOU Z Q, et al. SiamDA: Dual attention Siamese network for real-time visual tracking[J]. Signal Processing:Image Communication, 2021, 95: 116293. doi: 10.1016/j.image.2021.116293
    [16]
    GUPTA D K, ARYA D, GAVVES E. Rotation equivariant Siamese networks for tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 12357-12366.
    [17]
    VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 5000-5008.
    [18]
    ZHANG L C, GONZALEZ-GARCIA A, VAN DE WEIJER J, et al. Learning the model update for Siamese trackers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 4009-4018.
    [19]
    ZHU Z, WU W, ZOU W, et al. End-to-end flow correlation tracking with spatial-temporal attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 548-557.
    [20]
    ZHANG C Y, WANG H, WEN J W, et al. Deeper Siamese network with stronger feature representation for visual tracking[J]. IEEE Access, 2020, 8: 119094-119104. doi: 10.1109/ACCESS.2020.3005511
    [21]
    WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803.
    [22]
    CAO Y, XU J R, LIN S, et al. GCNet: Non-local networks meet squeeze-excitation networks and beyond[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision . Piscataway: IEEE Press, 2020: 1971-1980.
    [23]
    WANG M M, LIU Y, HUANG Z Y. Large margin object tracking with circulant feature maps[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 4800-4808.
    [24]
    DANELLJAN M, HÄGER G, KHAN F S, et al. Discriminative scale space tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(8): 1561-1575. doi: 10.1109/TPAMI.2016.2609928
    [25]
    BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: Complementary learners for real-time tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1401-1409.
    [26]
    DANELLJAN M, HÄGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 4310-4318.
    [27]
    DANELLJAN M, BHAT G, KHAN F S, et al. ECO: Efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6931-6939.
    [28]
    DANELLJAN M, HAGER G, KHAN F S, et al. Convolutional features for correlation filter based visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision . Piscataway: IEEE Press, 2015: 58-66.
    [29]
    ZHANG Z P, PENG H W. Deeper and wider Siamese networks for real-time visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4586-4595.
    [30]
    LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980.
    [31]
    BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning discriminative model prediction for tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 6181-6190.
    [32]
    ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 103-119.
    [33]
    DANELLJAN M, VAN GOOL L, TIMOFTE R. Probabilistic regression for visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 7181-7190.
    [34]
    DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: Accurate tracking by overlap maximization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4655-4664.
    [35]
    LI P X, CHEN B Y, OUYANG W L, et al. GradNet: Gradient-guided network for visual object tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 6161-6170.
    [36]
    GUO Q, FENG W, ZHOU C, et al. Learning dynamic Siamese network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 1781-1789.
    [37]
    DANELLJAN M, ROBINSON A, KHAN F S, et al. Beyond correlation filters: Learning continuous convolution operators for visual tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 472-488.
    [38]
    YIN Z J, WEN C H, HUANG Z Y, et al. SiamVGG-LLC: Visual tracking using LLC and deeper Siamese networks[C]//Proceedings of the IEEE International Conference on Communication Technology. Piscataway: IEEE Press, 2020: 1683-1687.
    [39]
    LUKEŽIC A, VOJÍR T, ZAJC L C, et al. Discriminative correlation filter with channel and spatial reliability[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 4847-4856.
    [40]
    XU T Y, FENG Z H, WU X J, et al. Joint group feature selection and discriminative filter learning for robust visual object tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 7949-7959.
    [41]
    DAI K N, WANG D, LU H C, et al. Visual tracking via adaptive spatially-regularized correlation filters[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4665-4674.
    [42]
    ZHANG Y H, WANG L J, QI J Q, et al. Structured Siamese network for real-time visual tracking[C]//Proceedings of the Enropean conference on Computer Vision. Berlin: Springer, 2018: 355-370.
    [43]
    LI F, TIAN C, ZUO W M, et al. Learning spatial-temporal regularized correlation filters for visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4904-4913.
    [44]
    CHOI J, CHANG H J, FISCHER T, et al. Context-aware deep feature compression for high-speed visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 479-488.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(15)  / Tables(3)

    Article Metrics

    Article views(565) PDF downloads(17) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return