-
摘要:
为提高全卷积孪生网络(SiamFC)跟踪器在复杂场景下的跟踪能力,缓解跟踪器在跟踪过程中出现的目标漂移问题,提出一种结合空间注意力机制的实时目标跟踪算法。在SiamFC框架基础上,将改进的视觉几何组(VGG)网络作为主干网络,增强跟踪器对于目标深度特征的建模能力。对自注意力机制进行优化,提出一种即插即用的轻量级单卷积注意力模块(SCAM),将空间注意力分解为2个并行的一维特征编码过程,减少空间注意力的计算复杂度。保留跟踪过程中的初始目标模板作为第1模板,通过分析连通域在跟踪结果响应图的变化动态选择第2模板,融合2个模板后对目标进行定位。实验结果表明:在OTB100、LaSOT和UAV123数据集上,所提算法相比于SiamFC跟踪成功率分别提高了0.082、0.045和0.045,跟踪精度分别提高了0.118、0.051和0.062;在VOT2018数据集上,所提算法相比于SiamFC在跟踪准确率、鲁棒性和期望平均重叠率上分别提高了0.029、0.276和0.134;跟踪速度达到了70帧/s,能够满足实时跟踪的需求。
Abstract:A real-time object tracking method coupled with a spatial attention mechanism is suggested in order to enhance the fully convolutional Siamese network (SiamFC) tracker’s tracking capability in complex settings and alleviate the target drift problem in the tracking process. The improved visual geometry group (VGG) network is used as the backbone network to enhance the modeling ability of the tracker for the target deep feature. The self-attention mechanism is optimized, and a lightweight single convolution attention module (SCAM) is proposed. The spatial attention is decomposed into two parallel one-dimensional feature coding processes to reduce the computational complexity of spatial attention. The initial target template in the tracking process is retained as the first template, and the second template is dynamically selected by analyzing the variation of the connected domain in the tracking response map. The target is located after fusing the two templates. The experimental results show that, compared with SiamFC, the success rate of the proposed algorithm on OTB100, LaSOT, and UAV123 datasets is increased respectively by 0.082, 0.045, and 0.045, and the tracking accuracy by 0.118,0.051, and 0.062. On the VOT2018 dataset, the proposed algorithm improves the tracking accuracy, robustness, and expected average overlap by 0.029, 0.276, and 0.134, respectively, compared with SiamFC. Real-time tracking requirements can be satisfied by the tracking speed, which can approach 70 frames per second.
-
Key words:
- object tracking /
- Siamese network /
- attention mechanism /
- model update /
- deep learning
-
表 1 在OTB100数据集上$\alpha $取不同值时的实验结果
Table 1. Experimental result of different $\alpha $ values on OTB100 dataset
$\alpha $ 成功率 精度 0.50 0.655 0.870 0.55 0.669 0.890 0.60 0.651 0.869 0.65 0.658 0.878 0.70 0.659 0.876 0.75 0.660 0.870 0.80 0.655 0.875 0.85 0.659 0.875 0.90 0.659 0.880 0.95 0.653 0.866 表 2 不同算法在VOT2018数据集上的实验结果
Table 2. Experimental results on of different algonithms VOT2018 dataset
算法 准确率 鲁棒性 EAO SiamFC[13] 0.503 0.585 0.188 DSiam[36] 0.512 0.646 0.196 UpdateNet[18] 0.518 0.454 0.244 CCOT[37] 0.494 0.318 0.267 ECO[27] 0.484 0.276 0.280 SiamVGG[38] 0.531 0.318 0.286 DeepCSRDCF[39] 0.489 0.276 0.293 本文算法 0.532 0.309 0.322 表 3 消融实验结果
Table 3. result of ablation study
实验 成功率 精度 SiamFC 0.587 0.772 SiamFC-V 0.624 0.833 SiamFC-V-A 0.657 0.875 SiamFC-V-A-U 0.669 0.890 -
[1] GAO M, JIN L S, JIANG Y Y, et al. Manifold Siamese network: A novel visual tracking ConvNet for autonomous vehicles[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(4): 1612-1623. doi: 10.1109/TITS.2019.2930337 [2] 寇展, 吴健发, 王宏伦, 等. 基于深度学习的低空小型无人机障碍物视觉感知[J]. 中国科学:信息科学, 2020, 50(5): 692-703. doi: 10.1360/N112019-00034KOU Z, WU J F, WANG H L, et al. Obstacle visual sensing based on deep learning for low-altitude small unmanned aerial vehicles[J]. Scientia Sinica (Informationis), 2020, 50(5): 692-703(in Chinese). doi: 10.1360/N112019-00034 [3] MARVASTI-ZADEH S M, CHENG L, GHANEI-YAKHDAN H, et al. Deep learning for visual tracking: A comprehensive survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(5): 3943-3968. doi: 10.1109/TITS.2020.3046478 [4] LUO J H, HAN Y, FAN L Y. Underwater acoustic target tracking: A review[J]. Sensors, 2018, 18(2): 112. doi: 10.3390/s18010112 [5] 刘芳, 孙亚楠, 王洪娟, 等. 基于残差学习的自适应无人机目标跟踪算法[J]. 北京航空航天大学学报, 2020, 46(10): 1874-1882. doi: 10.13700/j.bh.1001-5965.2019.0551LIU F, SUN Y N, WANG H J, et al. Adaptive UAV target tracking algorithm based on residual learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(10): 1874-1882 (in Chinese). doi: 10.13700/j.bh.1001-5965.2019.0551 [6] 韩明, 王景芹, 王敬涛, 等. 基于孪生网络的目标跟踪研究综述[J]. 河北科技大学学报, 2022, 43(1): 27-41.HAN M, WANG J Q, WANG J T, et al. Comprehensive survey on target tracking based on Siamese network[J]. Journal of Hebei University of Science and Technology, 2022, 43(1): 27-41(in Chinese). [7] HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596. doi: 10.1109/TPAMI.2014.2345390 [8] ZHANG K H, ZHANG L, LIU Q S, et al. Fast visual tracking via dense spatio-temporal context learning[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2014: 127-141. [9] DANELLJAN M, KHAN F S, FELSBERG M, et al. Adaptive color attributes for real-time visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 1090-1097. [10] MA C, HUANG J B, YANG X K, et al. Hierarchical convolutional features for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 3074-3082. [11] WANG L J, OUYANG W L, WANG X G, et al. Visual tracking with fully convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 3119-3127. [12] BHAT G, JOHNANDER J, DANELLJAN M, et al. Unveiling the power of deep tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 493-509. [13] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//Eurpean Conference on computer Vision. Berlin: Springer, 2016: 850-865. [14] HE A F, LUO C, TIAN X M, et al. A twofold Siamese network for real-time object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4834-4843. [15] PU L, FENG X X, HOU Z Q, et al. SiamDA: Dual attention Siamese network for real-time visual tracking[J]. Signal Processing:Image Communication, 2021, 95: 116293. doi: 10.1016/j.image.2021.116293 [16] GUPTA D K, ARYA D, GAVVES E. Rotation equivariant Siamese networks for tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 12357-12366. [17] VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 5000-5008. [18] ZHANG L C, GONZALEZ-GARCIA A, VAN DE WEIJER J, et al. Learning the model update for Siamese trackers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 4009-4018. [19] ZHU Z, WU W, ZOU W, et al. End-to-end flow correlation tracking with spatial-temporal attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 548-557. [20] ZHANG C Y, WANG H, WEN J W, et al. Deeper Siamese network with stronger feature representation for visual tracking[J]. IEEE Access, 2020, 8: 119094-119104. doi: 10.1109/ACCESS.2020.3005511 [21] WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803. [22] CAO Y, XU J R, LIN S, et al. GCNet: Non-local networks meet squeeze-excitation networks and beyond[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision . Piscataway: IEEE Press, 2020: 1971-1980. [23] WANG M M, LIU Y, HUANG Z Y. Large margin object tracking with circulant feature maps[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 4800-4808. [24] DANELLJAN M, HÄGER G, KHAN F S, et al. Discriminative scale space tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(8): 1561-1575. doi: 10.1109/TPAMI.2016.2609928 [25] BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: Complementary learners for real-time tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1401-1409. [26] DANELLJAN M, HÄGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 4310-4318. [27] DANELLJAN M, BHAT G, KHAN F S, et al. ECO: Efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6931-6939. [28] DANELLJAN M, HAGER G, KHAN F S, et al. Convolutional features for correlation filter based visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision . Piscataway: IEEE Press, 2015: 58-66. [29] ZHANG Z P, PENG H W. Deeper and wider Siamese networks for real-time visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4586-4595. [30] LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980. [31] BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning discriminative model prediction for tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 6181-6190. [32] ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 103-119. [33] DANELLJAN M, VAN GOOL L, TIMOFTE R. Probabilistic regression for visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 7181-7190. [34] DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: Accurate tracking by overlap maximization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4655-4664. [35] LI P X, CHEN B Y, OUYANG W L, et al. GradNet: Gradient-guided network for visual object tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 6161-6170. [36] GUO Q, FENG W, ZHOU C, et al. Learning dynamic Siamese network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 1781-1789. [37] DANELLJAN M, ROBINSON A, KHAN F S, et al. Beyond correlation filters: Learning continuous convolution operators for visual tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 472-488. [38] YIN Z J, WEN C H, HUANG Z Y, et al. SiamVGG-LLC: Visual tracking using LLC and deeper Siamese networks[C]//Proceedings of the IEEE International Conference on Communication Technology. Piscataway: IEEE Press, 2020: 1683-1687. [39] LUKEŽIC A, VOJÍR T, ZAJC L C, et al. Discriminative correlation filter with channel and spatial reliability[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 4847-4856. [40] XU T Y, FENG Z H, WU X J, et al. Joint group feature selection and discriminative filter learning for robust visual object tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 7949-7959. [41] DAI K N, WANG D, LU H C, et al. Visual tracking via adaptive spatially-regularized correlation filters[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4665-4674. [42] ZHANG Y H, WANG L J, QI J Q, et al. Structured Siamese network for real-time visual tracking[C]//Proceedings of the Enropean conference on Computer Vision. Berlin: Springer, 2018: 355-370. [43] LI F, TIAN C, ZUO W M, et al. Learning spatial-temporal regularized correlation filters for visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4904-4913. [44] CHOI J, CHANG H J, FISCHER T, et al. Context-aware deep feature compression for high-speed visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 479-488.