-
摘要:
基于孪生网络的跟踪方法通过离线训练跟踪模型,不需要对跟踪模型进行在线更新,兼顾了跟踪精度和速度。现有孪生网络目标跟踪方法使用固定阈值选择正负训练样本易造成训练样本漏选问题,且训练时分类分支和回归分支之间存在低相关性问题,不利于训练出高精度的跟踪模型。为此,提出了一种基于交并比(IoU)约束的孪生网络目标跟踪方法。通过使用动态阈值策略根据预定义锚框与目标真实框的相关统计特征,动态调整正负训练样本的界定阈值,提升跟踪精度。所提方法使用IoU质量评估分支代替分类分支,通过锚框与目标真实框之间的IoU反映目标位置,提升跟踪精度,降低模型的参数量。在数据集VOT2016、OTB-100、VOT2019、UAV123上进行了对比实验,所提方法均有较好的表现。在VOT2016数据集上,所提方法的跟踪精度比SiamRPN方法高0.017,期望平均重叠率为0.463,与SiamRPN++相比仅差0.001,实时运行速度可达220帧/s。
-
关键词:
- 目标跟踪 /
- 深度学习 /
- 孪生网络 /
- 交并比(IoU)约束 /
- 动态阈值
Abstract:The tracking method based on the Siamese network trains the tracking model offline. Therefore, it maintains a good balance between tracking accuracy and speed, which attracts the interest of a growing number of researchers recently. The existing Siamese network object tracking method uses a fixed threshold to select positive and negative training samples, which is easy to cause the problem of missing training samples, and such methods have low correlation between the classification branch and the regression branch during training, which is not conducive to training a high-precision tracking model. To this end, an object tracking method based on intersection over union (IoU)-constrained siamese network is proposed. By using a dynamic threshold strategy, the thresholds of positive and negative training samples are dynamically adjusted according to the relevant statistical characteristics of the predefined anchor boxes and the real boxes. Thereby improving the tracking accuracy. In addition, the proposed method uses the IoU quality assessment branch to replace the classification branch, and reflects the position of the target through the IoU between the anchor box and the target ground-truth frame, which improves the tracking accuracy and reduces the amount of model parameters. The proposed object tracking method based on the IoU-constrained Siamese network has been compared and tested on four datasets: VOT2016, OTB-100, VOT2019, and UAV123. Ideal results have been achieved in these datasets. The tracking accuracy of the proposed method in this paper is 0.017 higher than SiamRPN on the VOT2016 dataset. And with a real-time running speed at 220 frame/s, the expected average overlap rate is 0.463, which is only 0.001 worse than SiamRPN++.
-
表 1 不同方法在VOT2016数据集上的实验结果
Table 1. Experimental results of different methods on VOT2016 dataset
方法 精度 鲁棒性 期望平均重叠率 参数量/MB 速度/(帧·s-1) 本文方法 0.635 0.200 0.463 41.8 220 SiamBAN[28] 0.666 0.144 0.505 410 54.53 SiamMask[29] 0.643 0.219 0.455 82.1 55 SiamFC++[4] 0.612 0.266 0.357 71.24 90 SiamRPN++[16] 0.640 0.200 0.464 206 35 SiamRPN[5] 0.618 0.238 0.393 23.8 180 DaSiamRPN[30] 0.610 0.220 0.411 86.3 160 ATOM[31] 0.610 0.187 0.430 108 30 SiamFC[7] 0.530 0.460 0.235 8.92 86 表 2 不同方法在数据集VOT2019上的实验结果
Table 2. Experimental results of different methods on VOT2019 dataset
-
[1] 周千里, 张文靖, 赵路平, 等. 面向个体人员特征的跨模态目标跟踪方法[J]. 北京航空航天大学学报, 2020, 46(9): 1635-1642. 体人员特征的跨模态目标跟踪ZHOU Q L, ZHANG W J, ZHAO L P, et al. Cross-modal object tracking algorithm based on pedestrian attribute[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(9): 1635-1642(in Chinese). 体人员特征的跨模态目标跟踪 [2] 罗元, 肖航, 欧俊雄. 基于深度学习的目标跟踪技术的研究综述[J]. 半导体光电, 2020, 41(6): 757. https://www.cnki.com.cn/Article/CJFDTOTAL-BDTG202006001.htmLUO Y, XIAO H, OU J X. Research on target tracking technology based on deep learning[J]. Semiconductor Optoelectronics, 2020, 41(6): 757(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-BDTG202006001.htm [3] ZHANG S F, CHI C, YAO Y Q, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 9759-9768. [4] XU Y D, WANG Z Y, LI Z X, et al. SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2020: 12549-12556. [5] LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980. [6] TAO R, GAVVES E, SMEULDERS A W M. Siamese instance search for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1420-1429. [7] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 850-865. [8] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. doi: 10.1007/s11263-015-0816-y [9] VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 2805-2813. [10] WANG Q, TENG Z, XING J L, et al. Learning attentions: Residual attentional Siamese network for high performance online visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4854-4863. [11] HE A F, LUO C, TIAN X M, et al. A twofold Siamese network for real-time object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4834-4843. [12] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031 [13] FAN H, LING H B. Siamese cascaded region proposal networks for real-time visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7952-7961. [14] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems, 2012: 1097-1105. [15] ZHANG Z P, PENG H W. Deeper and wider Siamese networks for real-time visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4591-4600. [16] LI B, WU W, WANG Q, et al. SiamRPN++: Evolution of Siamese visual tracking with very deep networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4282-4291. [17] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778. [18] GUO D Y, WANG J, CUI Y, et al. SiamCAR: Siamese fully convolutional classification and regression for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6269-6277. [19] LI X, WANG W H, WU L J, et al. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection[EB/OL]. (2020-06-08)[2021-09-01]. https://arxiv.org/abs/2006.04388. [20] WU Y, LIM J, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834-1848. doi: 10.1109/TPAMI.2014.2388226 [21] HADFIELD S J, BOWDEN R, LEBDA K. The visual object tracking VOT2016 challenge results[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 777-823. [22] KRISTAN M, MATAS J, LEONARDIS A, et al. The seventh visual object tracking VOT2019 challenge results[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE Press, 2019: 2206-2241. [23] MUELLER M, SMITH N, GHANEM B. A benchmark and simulator for UAV tracking[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 445-461. [24] REAL E, SHLENS J, MAZZOCCHI S, et al. YouTube-BoundingBoxes: A large high-precision human-annotated data set for object detection in video[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 5296-5305. [25] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]//European Conference on Computer Vision. Berlin: Springer, 2014: 740-755. [26] HUANG L H, ZHAO X, HUANG K Q. GOT-10k: A large high-diversity benchmark for generic object tracking in the wild[EB/OL]. (2019-11-20)[2021-09-01]. https://arxiv.org/abs/1810.11981v2. [27] FAN H, LIN L T, YANG F, et al. LaSOT: A high-quality benchmark for large-scale single object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 5374-5383. [28] ZEDU C, BINENG Z, GUORONG L, et al. Siamese box adaptive network for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6668-6677. [29] WANG Q, ZHANG L, BERTINETTO L, et al. Fast online object tracking and segmentation: A unifying approach[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1328-1338. [30] ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 101-117. [31] DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: Accurate tracking by overlap maximization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4660-4669. [32] NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 4293-4302. [33] DANELLJAN M, BHAT G, SHAHBAZ K F, et al. ECO: Efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6638-6646. [34] BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: Complementary learners for real-time tracking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1401-1409. [35] DANELLJAN M, HAGER G, SHAHBAZ K F, et al. Learning spatially regularized correlation filters for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 4310-4318. [36] WANG G T, LUO C, XIONG Z W, et al. SPM-tracker: Series-parallel matching for real-time visual object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 3643-3652. [37] DANELLJAN M, HAGER G, KHAN F, et al. Accurate scale estimation for robust visual tracking[C]//British Machine Vision Conference. Berlin: Springer, 2014: 1-11.