-
摘要:
为降低目标运动时产生的外观形变对目标跟踪的影响,在DaSiamese-RPN基础上进行改进,提出了一种外观动作自适应的目标跟踪方法。在孪生网络的子网络中引入外观动作自适应更新模块,融合目标的时空信息和动作特征;利用2种欧氏距离分别度量真实图和预测图之间的全局和局部差异,并对二者加权融合构建损失函数,加强预测目标特征图与真实目标特征图之间全局和局部信息的关联性。在VOT2016、VOT2018、VOT2019和OTB100数据集上进行测试,实验结果表明:在VOT2016和VOT2018数据集上,预测平均重叠率分别提高4.5%和6.1%;在VOT2019数据集上,准确度提高0.4%,预测平均重叠率降低1%;在OTB100数据集上,跟踪成功率提高0.3%,精确度提高0.2%。
Abstract:On the basis of DaSiamese-RPN, a target tracking approach of appearance and action adaptation is proposed to limit the effect of appearance deformation on target tracking when the target is moving.First of all, the appearance and action adaptive module is introduced in the subnet of the Siamese network, which integrates the object's spatial information and action feature. Secondly, the global and local divergence between the actual and predicted feature maps are measured by using two Euclidean distances, and the loss function is constructed by weighting the fusion of the two, so as to strengthen the correlation between the global and local information. Finally, tests were conducted on the VOT2016, VOT2018, VOT2019, and OTB100 datasets. The experimental results showed that the expected average overlap was improved by 4.5% and 6.1% in the VOT2016 and VOT2018 datasets respectively. On the VOT2019 dataset, accuracy increased by 0.4% and expected average overlap decreased by 1%; The tracking success rate was improved by 0.3% and accuracy increased by 0.2% when evaluated on the OTB100 dataset.
-
表 1 VOT2016数据集测试结果
Table 1. Results of testing on VOT2016 dataset
模型 准确度/% 鲁棒性/% 预测平均重叠率/% DaSiamese-RPN 61 22 41.1 本文(ω=0) 62.7 21.4 42.5 本文(ω=1 000) 61.3 19.6 45.5 本文(ω=500) 60.9 19.6 44.2 本文(ω=100) 61.4 18.6 45.6 表 2 VOT2018数据集测试结果
Table 2. Results of testing on VOT2018 dataset
模型 准确度/% 鲁棒性/% 预测平均重叠率/% DaSiamese-RPN 56.9 33.7 32.6 本文(ω=0) 58.4 29.5 35.2 本文(ω=1 000) 58.5 28.6 37.2 本文(ω=500) 58.5 25.8 38.7 本文(ω=100) 58.5 28.6 36.5 表 3 VOT2019数据集测试结果
Table 3. Results of testing on VOT2019 dataset
模型 准确度/% 鲁棒性/% 预测平均重叠率/% DaSiamese-RPN 58.2 52.7 27.2 本文(ω=0) 58.3 54.7 26.7 本文(ω=1 000) 58.5 55.2 26.8 本文(ω=500) 58.6 55.2 26.2 本文(ω=100) 58.5 55.7 26 表 4 OTB100数据集测试结果
Table 4. Results of testing on OTB100 dataset
模型 跟踪成功率/% 精确度/% DaSiamese-RPN 64.6 85.9 本文(ω=0) 64.9 86.1 本文(ω=1 000) 64.5 85.5 本文(ω=500) 64.6 85.8 本文(ω=100) 64.8 86 表 5 不同方法在VOT2016数据集上的对比
Table 5. Comparison with different methods on VOT2016 dataset
表 6 不同方法在VOT2018数据集上的对比
Table 6. Comparison with different methods on VOT2018 dataset
-
[1] ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the European Conference on Computer Vision(ECCV). Berlin: Springer, 2018: 101-117. [2] LI B, WU W, WANG Q, et al. Evolution of Siamese visual tracking with very deep networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 16-20. [3] WANG Q, ZHANG L, BERTINETTO L, et al. Fast online object tracking and segmentation: A unifying approach[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1328-1338. [4] ZHANG Z P, PENG H W. Deeper and wider Siamese networks for real-time visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4591-4600. [5] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778. [6] XIE S N, GIRSHICK R, DOLLAR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1492-1500. [7] HOWARD A G, ZHU M, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17)[2021-10-01]. https://arxiv.org/abs/1704.04861. [8] VOIGTLAENDER P, LUITEN J, TORR P H S, et al. Siam R-CNN: Visual tracking by re-detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6578-6588. [9] ZHANG L, GONZALEZ-GARCIA A, WEIJER J, et al. Learning the model update for Siamese trackers[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4010-4019. [10] WANG Z, SHE Q, SMOLIC A. ACTION-Net: Multipath excitation for action recognition[EB/OL]. (2021-03-11)[2021-10-01]. https://arxiv.org/abs/2103.07372. [11] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully -convolutional Siamese networks for object tracking[C]//Proceedings of the European Conference on Computer Vision(ECCV). Berlin: Springer, 2016: 850-865. [12] TAO R, GAVVES E, SMEULDERS A W M. Siamese instance search for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1420-1429. [13] FAN H, LIN L, YANG F, et al. LaSOT: A high-quality benchmark for large-scale single object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 5374-5383. [14] KRISTAN M, MATAS J, LEONARDIS A, et al. A novel performance evaluation methodology for single-target trackers[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(11): 2137-2155. doi: 10.1109/TPAMI.2016.2516982 [15] KRISTAN M, LEONARDIS A, MATAS J, et al. The visual object tracking VOT2016 challenge results[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2016: 2-5. [16] KRISTAN M, LEONARDIS A, MATAS J, et al. The sixth visual object tracking VOT2018 challenge results[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2018: 3-53. [17] KRISTAN M, MATAS J, LEONARDIS A, et al. The seventh visual object tracking VOT2019 challenge results[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019. [18] WU Y, LIM J, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834-1848. doi: 10.1109/TPAMI.2014.2388226 [19] YANG T, CHAN A B. Learning dynamic memory networks for object tracking[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2018: 152-167. [20] SUN C, WANG D, LU H, et al. Correlation tracking via joint discrimination and reliability learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 489-497. [21] BHAT G, JOHNANDER J, DANELLJAN M, et al. Unveiling the power of deep tracking[C]//Proceedings of the European Conference on Computer Vision(ECCV). Berlin: Springer, 2018: 483-498. [22] XU T Y, FENG Z H, WU X J, et al. Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5596-5609. [23] LI P, CHEN B, OUYANG W, et al. GradNet: Gradient-guided network for visual object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 6162-6171. [24] FAN H, LING H. Siamese cascaded region proposal networks for real-time visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7952-7961. [25] FENG Q, ABLAVSKY V, BAI Q X, et al. Real-time visual object tracking with natural language description[C]//2020 IEEE Winter Conference on Applications of Computer Vision (WACV). Piscataway: IEEE Press, 2020: 700-709. [26] FENG Q, ABLAVSKY V, BAI Q, et al. Siamese natural language tracker: Tracking by natural language descriptions with Siamese trackers[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 5851-5860. -