-
摘要:
针对全卷积孪生网络(SiamFC)跟踪器在复杂场景下表征能力不足且缺乏在线更新问题,提出一种基于多注意力与双模板更新的视觉跟踪算法。使用VGG16网络替换AlexNet,用SoftPool代替最大池化层,构建特征提取网络;在骨干网络后添加多注意力模块(MAM),增强网络对目标特征的提取能力;设计双模板进行特征融合和响应图融合,使用平均峰值相关能量(APCE)判断是否更新动态模板,有效提高跟踪鲁棒性。在GOT-10k数据集上对所提算法进行训练,并分别在OTB2015、VOT2018和UAV123数据集上进行测试,实验结果表明:相较于基准SiamFC算法,所提算法在OTB2015和UAV123数据集上,跟踪成功率分别提高了0.085和0.037,精确度分别提升了0.118和0.058;在VOT2018数据集上,跟踪准确率、鲁棒性和期望平均重叠率(EAO)分别提升了0.030、0.295和0.139。所提算法在复杂场景下取得了较高的跟踪准确度,并且运行速度达到33.9帧/s,满足实时跟踪要求。
Abstract:To address the problem of insufficient representational capability and lack of online update of the fully-convolutional Siamese network (SiamFC) tracker in complex scenes, this paper proposed a visual tracking algorithm based on multi-attention and dual-template update. First, the feature extraction network was constructed by replacing AlexNet with the VGG16 network, and SoftPool was used to replace the maximum pooling layer. Secondly, the multi-attention module (MAM) was added after the backbone network to enhance the network’s ability to extract object features. Finally, a dual template was designed for feature fusion and response map fusion, and average peak-to-correlation energy (APCE) was used to determine whether to update the dynamic templates, which effectively improved the tracking robustness. The proposed algorithm was trained on the GOT-10k dataset and tested on the OTB2015, VOT2018, and UAV123 datasets. The experimental results show that, compared with the benchmark algorithm SiamFC, the tracking success rate of the proposed algorithm is increased by 0.085 and 0.037, and the accuracy is increased by 0.118 and 0.058 on the OTB2015 and UAV123 datasets, respectively. On the VOT2018 dataset, the tracking accuracy, robustness, and expected average overlap (EAO) are improved respectively by 0.030, 0.295 and 0.139. The proposed algorithm achieves high tracking accuracy in complex scenes, and the tracking speed reaches 33.9 frame/s, which meets the real-time tracking requirements.
-
Key words:
- visual object tracking /
- Siamese network /
- SoftPool /
- attention mechanism /
- feature fusion
-
表 1 不同取值在OTB2015数据集上的实验结果
Table 1. Experimental results of different values on OTB2015 dataset
$\mu $ 成功率 精确度 0.6 0.657 0.884 0.65 0.664 0.892 0.7 0.651 0.875 0.75 0.658 0.882 0.8 0.659 0.884 0.85 0.667 0.899 0.9 0.661 0.885 0.95 0.657 0.880 1.0 0.655 0.881 表 2 不同属性在OTB2015数据集上的成功率
Table 2. Success rate of different attributes on OTB2015 dataset
算法 背景干扰 形变 快速运动 光照变化 平面内旋转 低分辨率 运动模糊 遮挡 平面外旋转 出视野 尺度变化 本文 0.663 0.624 0.664 0.690 0.655 0.678 0.691 0.636 0.647 0.621 0.644 ATOM[10] 0.619 0.631 0.642 0.655 0.635 0.705 0.659 0.637 0.629 0.609 0.671 DiMP18[11] 0.616 0.617 0.674 0.654 0.650 0.611 0.679 0.636 0.635 0.628 0.675 DaSiamRPN[25] 0.625 0.599 0.655 0.670 0.669 0.594 0.651 0.577 0.636 0.605 0.652 ECO-HC[13] 0.636 0.595 0.634 0.640 0.582 0.533 0.627 0.629 0.608 0.594 0.611 GradNet[26] 0.611 0.572 0.625 0.643 0.627 0.673 0.647 0.617 0.629 0.585 0.614 DeepSRDCF[6] 0.627 0.566 0.628 0.621 0.589 0.564 0.643 0.602 0.607 0.555 0.606 SiamFC-VGG[27] 0.591 0.603 0.600 0.631 0.623 0.689 0.630 0.576 0.614 0.533 0.619 SiamRPN[9] 0.591 0.617 0.600 0.649 0.628 0.642 0.623 0.586 0.625 0.544 0.615 SiamDW[8] 0.574 0.560 0.630 0.622 0.606 0.598 0.654 0.602 0.612 0.592 0.614 SRDCF[6] 0.583 0.544 0.597 0.613 0.544 0.514 0.541 0.559 0.550 0.474 0.561 SiamFC[7] 0.527 0.512 0.571 0.572 0.559 0.621 0.555 0.550 0.561 0.511 0.557 Staple[28] 0.560 0.551 0.540 0.592 0.548 0.394 0.594 0.543 0.533 0.460 0.521 表 3 不同属性在OTB2015数据集上的精确度
Table 3. Precision of different attributes on OTB2015 dataset
算法 背景干扰 形变 快速运动 光照变化 平面内旋转 低分辨率 运动模糊 遮挡 平面外旋转 出视野 尺度变化 本文 0.895 0.870 0.890 0.914 0.912 0.983 0.912 0.859 0.903 0.845 0.884 ATOM[10] 0.806 0.856 0.828 0.866 0.869 0.993 0.855 0.835 0.851 0.808 0.877 DiMP18[11] 0.801 0.823 0.857 0.849 0.866 0.856 0.850 0.835 0.843 0.820 0.872 DaSiamRPN[25] 0.843 0.814 0.858 0.895 0.913 0.922 0.856 0.764 0.867 0.787 0.868 ECO-HC[13] 0.850 0.806 0.829 0.820 0.800 0.888 0.802 0.848 0.834 0.818 0.822 GradNet[26] 0.822 0.795 0.838 0.844 0.860 0.999 0.855 0.838 0.872 0.789 0.841 DeepSRDCF[6] 0.841 0.783 0.814 0.791 0.818 0.847 0.823 0.825 0.835 0.781 0.819 SiamFC-VGG[27] 0.761 0.828 0.777 0.817 0.845 0.997 0.817 0.764 0.833 0.706 0.834 SiamRPN[9] 0.799 0.825 0.789 0.859 0.854 0.978 0.816 0.780 0.851 0.726 0.838 SiamDW[8] 0.762 0.763 0.808 0.794 0.824 0.901 0.841 0.798 0.829 0.781 0.819 SRDCF[6] 0.775 0.734 0.768 0.792 0.745 0.760 0.765 0.734 0.741 0.594 0.745 SiamFC[7] 0.692 0.691 0.744 0.736 0.743 0.900 0.707 0.723 0.758 0.673 0.736 Staple[28] 0.749 0.752 0.708 0.783 0.768 0.690 0.698 0.726 0.737 0.664 0.726 表 4 本文算法在OTB2015数据集上的消融实验
Table 4. Ablation experiment of the proposed algorithm on OTB2015 dataset
表 5 不同算法在VOT2018数据集上的性能比较
Table 5. Comparisons of performance among different algorithms on VOT2018 dataset
表 6 不同算法在UAV123数据集上的性能比较
Table 6. Comparisons of performance among different algorithms on UAV123 dataset
-
[1] DONG E Z, ZHANG Y, DU S Z. An automatic object detection and tracking method based on video surveillance[C]//Proceedings of the IEEE International Conference on Mechatronics and Automation. Piscataway: IEEE Press, 2020: 1140-1144. [2] ZHAI L, WANG C P, HOU Y H, et al. MPC-based integrated control of trajectory tracking and handling stability for intelligent driving vehicle driven by four hub motor[J]. IEEE Transactions on Vehicular Technology, 2022, 71(3): 2668-2680. doi: 10.1109/TVT.2022.3140240 [3] MANGAL N K, TIWARI A K. Kinect v2 tracked body joint smoothing for kinematic analysis in musculoskeletal disorders[C]//Proceedings of the 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society. Piscataway: IEEE Press, 2020: 5769-5772. [4] DEWANGAN D K, SAHU S P. Real time object tracking for intelligent vehicle[C]//Proceedings of the 1st International Conference on Power, Control and Computing Technologies. Piscataway: IEEE Press, 2020: 134-138. [5] DANELLJAN M, HÄGER G, KHAN F S, et al. Convolutional features for correlation filter based visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision Workshop. Piscataway: IEEE Press, 2015: 621-629. [6] DANELLJAN M, HÄGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 4310-4318. [7] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 850-865. [8] ZHANG Z P, PENG H W. Deeper and wider Siamese networks for real-time visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4586-4595. [9] LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese Region proposal network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980. [10] DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: accurate tracking by overlap maximization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4655-4664. [11] BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning discriminative model prediction for tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 6181-6190. [12] WANG Q, TENG Z, XING J L, et al. Learning attentions: residual attentional Siamese network for high performance online visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4854-4863. [13] DANELLJAN M, BHAT G, KHAN F S, et al. ECO: efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6931-6939. [14] ZHANG L C, GONZALEZ-GARCIA A, VAN DE WEIJER J, et al. Learning the model update for Siamese trackers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 4009-4018. [15] ZHANG C Y, WANG H, WEN J W, et al. Deeper Siamese network with stronger feature representation for visual tracking[J]. IEEE Access, 2020, 8: 119094-119104. doi: 10.1109/ACCESS.2020.3005511 [16] STERGIOU A, POPPE R, KALLIATAKIS G. Refining activation downsampling with SoftPool[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 10337-10346. [17] HUANG L H, ZHAO X, HUANG K Q. GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1562-1577. doi: 10.1109/TPAMI.2019.2957464 [18] WU Y, LIM J, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834-1848. doi: 10.1109/TPAMI.2014.2388226 [19] KRISTAN M, LEONARDIS A, MATAS J, et al. The sixth visual object tracking VOT2018 challenge results[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2019: 3-53. [20] MUELLER M, SMITH N, GHANEM B. A benchmark and simulator for UAV tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 445-461. [21] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141. [22] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 3-19. [23] YANG Z X, ZHU L C, WU Y, et al. Gated channel transformation for visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 11794-11803. [24] HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13708-13717. [25] ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 103-119. [26] LI P X, CHEN B Y, OUYANG W L, et al. GradNet: gradient-guided network for visual object tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 6161-6170. [27] LI Y H, ZHANG X F, CHEN D M. SiamVGG: visual tracking using deeper Siamese networks[EB/OL]. (2022-07-04)[2023-02-01]. https://arxiv.org/abs/1902.02804v4. [28] BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: complementary learners for real-time tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1401-1409. [29] DANELLJAN M, ROBINSON A, SHAHBAZ KHAN F, et al. Beyond correlation filters: learning continuous convolution operators for visual tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 472-488. [30] GUO Q, FENG W, ZHOU C, et al. Learning dynamic Siamese network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 1781-1789. [31] WANG N, ZHOU W G, TIAN Q, et al. Multi-cue correlation filters for robust visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4844-4853. [32] HUANG Z Y, FU C H, LI Y M, et al. Learning aberrance repressed correlation filters for real-time UAV tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 2891-2900. [33] LI Y M, FU C H, DING F Q, et al. AutoTrack: towards high-performance visual tracking for UAV with automatic spatio-temporal regularization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 11920-11929. [34] LI F, TIAN C, ZUO W M, et al. Learning spatial-temporal regularized correlation filters for visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4904-4913. -