面向个体人员特征的跨模态目标跟踪算法

周千里; 张文靖; 赵路平; 田乃倩; 王蓉

doi:10.13700/j.bh.1001-5965.2020.0042

面向个体人员特征的跨模态目标跟踪算法

doi: 10.13700/j.bh.1001-5965.2020.0042

周千里^{1, 2},
张文靖¹,
赵路平¹,
田乃倩¹,
王蓉^{1, 3, ,}

1.
中国人民公安大学警务信息工程与网络安全学院, 北京 100038
2.
北京市公安局, 北京 100740
3.
安全防范技术与风险评估公安部重点实验室, 北京 100038

基金项目:

国家重点研发计划 A19808

中国人民公安大学基本科研业务费年度重大项目 2019JKF111

详细信息

作者简介:
周千里  男，博士研究生。主要研究方向：计算机视觉

张文靖  男，博士研究生。主要研究方向：计算机视觉

赵路平  女，硕士研究生。主要研究方向：单目标跟踪

田乃倩  女，硕士研究生。主要研究方向：单目标跟踪

王蓉  女，博士，教授，博士生导师。主要研究方向：计算机视觉

通讯作者:
王蓉.E-mail: dbdxwangrong@163.com

中图分类号: TP183;TP389.1
计量
- 文章访问数: 644
- HTML全文浏览量: 176
- PDF下载量: 108
- 被引次数: 0
出版历程
- 收稿日期: 2020-02-21
- 录用日期: 2020-03-15
- 网络出版日期: 2020-09-20

Cross-modal object tracking algorithm based on pedestrian attribute

1.
Police Information Engineering and Network Security College, People's Public Security University of China, Beijing 100038, China
2.
Beijing Municipal Public Security Bureau, Beijing 100740, China
3.
Key Laboratory of Security Technology and Risk Assessment, Beijing 100038, China

Funds:

National Key R & D Program of China A19808

The Operating Expenses of Basic Scientific Research Project of the People's Public Security University of China 2019JKF111

More Information

Corresponding author: WANG Rong.E-mail: dbdxwangrong@163.com

摘要

摘要:
针对类内干扰影响基于个体人员特征目标跟踪算法的精确性和鲁棒性问题，分析当前跟踪算法在个体人员跟踪方面存在的不足，提出了利用语言先验知识引导辅助跟踪器的方法。在视觉跟踪器的基础上增加语言引导分支，对跟踪目标产生注意力，从而减少对类内干扰的影响。利用位置置信度进行回归目标框定位的方法解决基于孪生网络目标跟踪算法中利用分类置信度定位候选目标框的局限性，实现跨模态信息融合提升特定目标跟踪的精度。为提升所提模型对特定人员目标跟踪的针对性，构建了跨模态的人员目标跟踪数据集用于训练和验证。实验表明：所提模型应用于个体人员跟踪时表现更佳，其有效性得到了证明。
- 目标跟踪 /
- 孪生网络 /
- 跨模态 /
- 数据集 /
- 语言先验
Abstract:
The accuracy and robustness of the object tracking algorithm have been influenced by the intra-class interference when tracking pedestrian. In this paper, we analyze the drawbacks of current tracking algorithms and propose a model to combine the visual feature and language priori to improve the performance of the tracker.The language guided branch is added to supervise the visual tracking branch by generating the attention, so the intra-class interference can be alleviated.We also propose a method to improve the accuracy of thecross-modal object tracking based on the location confidence instead of classification confidence for siamese trackers.To validate our method, we customize the dataset specialized for pedestrian tracking. The experiment shows the effectiveness of this model.
- target tracking /
- siamese network /
- cross-modal /
- dataset /
- language priori

HTML全文

图 1 视觉引用表达图像分割示意

Figure 1. Image segmentation of visual referring expression

下载: 全尺寸图片幻灯片

图 2 跨模态目标跟踪整体框架

Figure 2. Cross-modal object tracking framework

下载: 全尺寸图片幻灯片

图 3 多模块回归预测结果

Figure 3. Results of multiple modules predicted regression

下载: 全尺寸图片幻灯片

图 4 最小封闭矩形框生成图

Figure 4. Illustration of minimum enclosing rectangle

下载: 全尺寸图片幻灯片

图 5 主流跟踪器的结果比较

Figure 5. Comparative results among mainstream trackers

下载: 全尺寸图片幻灯片

图 6 本文模型与主流跟踪算法的OPE评估结果

Figure 6. OPE evaluation results between proposed model and mainstream tracking algorithms

下载: 全尺寸图片幻灯片

图 7 不同跟踪器效果可视化

Figure 7. Results visualization of different trackers

下载: 全尺寸图片幻灯片

图 8 语言检测跟踪评估

Figure 8. Tracking assessment of language detection

下载: 全尺寸图片幻灯片

表 1 语言引导模块评估结果对比

Table 1. Comparison results of language guided module

模型参数类型	平均交并比
参数0	0.241 3
参数1	0.359 8
参数2	0.349 8
优化参数	0.465 0

下载: 导出CSV

表 2 本文模型与主流跟踪算法评估结果对比

Table 2. Comparative results between proposed model and mainstream tracking algorithms

算法	平均精度	成功率
SiamRPN^[2]	0.493	0.566
SiamRPN++^[5]	0.508	0.612
SiamMask^[21]	0.708	0.808
ECO^[26]	0.647	0.797
ATOM^[27]	0.732	0.848
DIMP^[28]	0.787	0.841
本文模型	0.930	0.978

下载: 导出CSV

参考文献(28)

[1]	BERTINETTO L, VALMADRE J, HENRIQUE J F, et al.Fully-convolutional siamese networks for object tracking[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 850-865.
[2]	LI B, YAN J, WU W, et al.High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2018: 8971-8980.
[3]	KOSIOREK A R, BEWLEY A, POSNER I, et al.Hierarchical attentive recurrent tracking[C]//Neural Information Processing Systems, 2017, 36: 3053-3061.
[4]	ZHANG Z, PENG H.Deeper and wider siamese networks for real-time visual tracking[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2019: 4591-4600.
[5]	LI B, WU W, WANG Q, et al.Evolution of siamese visual tracking with very deep networks[J].IEEE Computer Vision and Pattern Recognition, 2019, 35(9):4282-4291. http://ieeexplore.ieee.org/document/8954116
[6]	ZHU Z, WANG Q, LI B, et al.Distractor-aware siamese networks for visual object tracking[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 103-119.
[7]	REN L, YUAN X, LU J, et al.Deep reinforcement learning with iterative shift for visual tracking[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 684-700.
[8]	ZHANG L, GONZALEZGARCIA A, DE WEIJER J V, et al.Learning the model update for siamese trackers[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 2019: 4010-4019.
[9]	MOGADALA A, KALIMUTHU M, KLAKOWl D, et al.Trends in integration of vision and language research:A survey of tasks, datasets, and methods[J].IEEE Computer Vision and Pattern Recognition, 2019, 30(19):1183-1986. http://cn.bing.com/academic/profile?id=7797262cdf373568fdd2a8c589597f7a&encoded=0&v=paper_preview&mkt=zh-cn
[10]	HU R, ROHRBACH M, DARRELL T, et al.Segmentation from natural language expressions[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 108-124.
[11]	LI Z, TAO R, GAVVES E, et al.Tracking by natural language specification[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2017: 7350-7358.
[12]	YU L, LIN Z, SHEN X, et al.Modular attention network for referring expression comprehension[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2018: 1307-1315.
[13]	SUN C, MYERS A, VONDRICK C, et al.A joint model for video and language representation learning[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 2019: 7464-7473.
[14]	SU W, ZHU X, CAO Y, et al.Pre-training of generic visual-linguistic representations[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 2019: 13-23.
[15]	WU Y, LIM J, YANG M, et al.Online object tracking: A benchmark[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2013: 2411-2418.
[16]	WU Y, LIM J, YANG M H.Object tracking benchmark[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1834-1848. doi: 10.1109/TPAMI.2014.2388226
[17]	GALOOGAHI H K, FAGG A, HUANG C, et al.A benchmark for higher frame rate object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 2017: 1134-1143.
[18]	MULLER M, BIBI A, GIANCOLA S, et al.A large-scale dataset and benchmark for object tracking in the wild[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 310-327.
[19]	HUANG L, ZHAO X, HUANG K, et al.A large high-diversity benchmark for generic object tracking in the wild[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 45(21):1374-1391. http://cn.bing.com/academic/profile?id=d9720ea63f33d60fdb1cd01a86175d08&encoded=0&v=paper_preview&mkt=zh-cn
[20]	FAN H, LIN L, YANG F, et al.A high-quality benchmark for large-scale single object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 2018: 5374-5383.
[21]	WANG Q, ZHANG L, BERTINETTO L, et al.Fast online object tracking and segmentation: A unifying approach[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2019: 1328-1338.
[22]	HE K, ZHANG X, REN S, et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2016: 770-778.
[23]	MARGFFOYTUAY E A, PEREZ J C, BOTERO E, et al.Dynamic multimodal instance segmentation guided by natural language queries[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 656-672.
[24]	JIANG B, LUO R, MAO J, et al.Acquisition of localization confidence for accurate object detection[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 816-832.
[25]	KAZEMZADE S, ORDONEZ V, MATTENV M, et al.Referring to objects in photographs of natural scene[C]//Empirical Methods in Natural Language Processing, 2014, 28: 787-789.
[26]	DANELLJIAN M, BHAT G, KHAN F S, et al.Efficient convolution operators for tracking[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2017: 6931-6939.
[27]	DANELLJIAN M, BHAT G, KHAN F S, et al.Accurate tracking by overlap maximization[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2019: 4660-4669.
[28]	BHAT G, DANELLJAN M, VAN GOOL L, et al.Learning discriminative model prediction for tracking[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 2019: 6182-6191.