-
摘要:
针对类内干扰影响基于个体人员特征目标跟踪算法的精确性和鲁棒性问题,分析当前跟踪算法在个体人员跟踪方面存在的不足,提出了利用语言先验知识引导辅助跟踪器的方法。在视觉跟踪器的基础上增加语言引导分支,对跟踪目标产生注意力,从而减少对类内干扰的影响。利用位置置信度进行回归目标框定位的方法解决基于孪生网络目标跟踪算法中利用分类置信度定位候选目标框的局限性,实现跨模态信息融合提升特定目标跟踪的精度。为提升所提模型对特定人员目标跟踪的针对性,构建了跨模态的人员目标跟踪数据集用于训练和验证。实验表明:所提模型应用于个体人员跟踪时表现更佳,其有效性得到了证明。
Abstract:The accuracy and robustness of the object tracking algorithm have been influenced by the intra-class interference when tracking pedestrian. In this paper, we analyze the drawbacks of current tracking algorithms and propose a model to combine the visual feature and language priori to improve the performance of the tracker.The language guided branch is added to supervise the visual tracking branch by generating the attention, so the intra-class interference can be alleviated.We also propose a method to improve the accuracy of thecross-modal object tracking based on the location confidence instead of classification confidence for siamese trackers.To validate our method, we customize the dataset specialized for pedestrian tracking. The experiment shows the effectiveness of this model.
-
Key words:
- target tracking /
- siamese network /
- cross-modal /
- dataset /
- language priori
-
表 1 语言引导模块评估结果对比
Table 1. Comparison results of language guided module
模型参数类型 平均交并比 参数0 0.241 3 参数1 0.359 8 参数2 0.349 8 优化参数 0.465 0 -
[1] BERTINETTO L, VALMADRE J, HENRIQUE J F, et al.Fully-convolutional siamese networks for object tracking[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 850-865. [2] LI B, YAN J, WU W, et al.High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2018: 8971-8980. [3] KOSIOREK A R, BEWLEY A, POSNER I, et al.Hierarchical attentive recurrent tracking[C]//Neural Information Processing Systems, 2017, 36: 3053-3061. [4] ZHANG Z, PENG H.Deeper and wider siamese networks for real-time visual tracking[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2019: 4591-4600. [5] LI B, WU W, WANG Q, et al.Evolution of siamese visual tracking with very deep networks[J].IEEE Computer Vision and Pattern Recognition, 2019, 35(9):4282-4291. http://ieeexplore.ieee.org/document/8954116 [6] ZHU Z, WANG Q, LI B, et al.Distractor-aware siamese networks for visual object tracking[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 103-119. [7] REN L, YUAN X, LU J, et al.Deep reinforcement learning with iterative shift for visual tracking[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 684-700. [8] ZHANG L, GONZALEZGARCIA A, DE WEIJER J V, et al.Learning the model update for siamese trackers[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 2019: 4010-4019. [9] MOGADALA A, KALIMUTHU M, KLAKOWl D, et al.Trends in integration of vision and language research:A survey of tasks, datasets, and methods[J].IEEE Computer Vision and Pattern Recognition, 2019, 30(19):1183-1986. http://cn.bing.com/academic/profile?id=7797262cdf373568fdd2a8c589597f7a&encoded=0&v=paper_preview&mkt=zh-cn [10] HU R, ROHRBACH M, DARRELL T, et al.Segmentation from natural language expressions[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 108-124. [11] LI Z, TAO R, GAVVES E, et al.Tracking by natural language specification[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2017: 7350-7358. [12] YU L, LIN Z, SHEN X, et al.Modular attention network for referring expression comprehension[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2018: 1307-1315. [13] SUN C, MYERS A, VONDRICK C, et al.A joint model for video and language representation learning[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 2019: 7464-7473. [14] SU W, ZHU X, CAO Y, et al.Pre-training of generic visual-linguistic representations[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 2019: 13-23. [15] WU Y, LIM J, YANG M, et al.Online object tracking: A benchmark[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2013: 2411-2418. [16] WU Y, LIM J, YANG M H.Object tracking benchmark[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1834-1848. doi: 10.1109/TPAMI.2014.2388226 [17] GALOOGAHI H K, FAGG A, HUANG C, et al.A benchmark for higher frame rate object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 2017: 1134-1143. [18] MULLER M, BIBI A, GIANCOLA S, et al.A large-scale dataset and benchmark for object tracking in the wild[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 310-327. [19] HUANG L, ZHAO X, HUANG K, et al.A large high-diversity benchmark for generic object tracking in the wild[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 45(21):1374-1391. http://cn.bing.com/academic/profile?id=d9720ea63f33d60fdb1cd01a86175d08&encoded=0&v=paper_preview&mkt=zh-cn [20] FAN H, LIN L, YANG F, et al.A high-quality benchmark for large-scale single object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 2018: 5374-5383. [21] WANG Q, ZHANG L, BERTINETTO L, et al.Fast online object tracking and segmentation: A unifying approach[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2019: 1328-1338. [22] HE K, ZHANG X, REN S, et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2016: 770-778. [23] MARGFFOYTUAY E A, PEREZ J C, BOTERO E, et al.Dynamic multimodal instance segmentation guided by natural language queries[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 656-672. [24] JIANG B, LUO R, MAO J, et al.Acquisition of localization confidence for accurate object detection[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 816-832. [25] KAZEMZADE S, ORDONEZ V, MATTENV M, et al.Referring to objects in photographs of natural scene[C]//Empirical Methods in Natural Language Processing, 2014, 28: 787-789. [26] DANELLJIAN M, BHAT G, KHAN F S, et al.Efficient convolution operators for tracking[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2017: 6931-6939. [27] DANELLJIAN M, BHAT G, KHAN F S, et al.Accurate tracking by overlap maximization[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2019: 4660-4669. [28] BHAT G, DANELLJAN M, VAN GOOL L, et al.Learning discriminative model prediction for tracking[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 2019: 6182-6191.