基于高效注意力和上下文感知的目标跟踪算法

柏罗; 张宏立; 王聪

doi:10.13700/j.bh.1001-5965.2021.0013

基于高效注意力和上下文感知的目标跟踪算法

doi: 10.13700/j.bh.1001-5965.2021.0013

新疆大学电气工程学院, 乌鲁木齐 830047

基金项目:

国家自然科学基金 51767022

国家自然科学基金 51967019

详细信息

通讯作者:
张宏立, E-mail: 1606829274@qq.com

中图分类号: TP391.4
计量
- 文章访问数: 342
- HTML全文浏览量: 130
- PDF下载量: 55
- 被引次数: 0
出版历程
- 收稿日期: 2021-01-11
- 录用日期: 2021-01-22
- 网络出版日期: 2021-03-10
- 整期出版日期: 2022-07-20

Target tracking algorithm based on efficient attention and context awareness

School of Electrical Engineering, Xinjiang University, Urumqi 830047, China

Funds:

National Natural Science Foundation of China 51767022

National Natural Science Foundation of China 51967019

More Information

Corresponding author: ZHANG Hongli, E-mail: 1606829274@qq.com

摘要

摘要:
基于匹配思想的孪生网络算法缺乏对目标的整体性感知，容易出现对目标状态估计不够精准和在复杂环境中跟丢的现象。为此，在孪生网络的基础上设计了2个轻量级的模块来实现更精准、更鲁棒的目标跟踪。在提取特征的主干网络之后，嵌入一个高效通道注意力模块，实现高效提取目标特征并增强差异化表示，使网络更注重于目标信息；模板匹配之后的特征通过一个局部上下文感知模块，增强网络对目标的整体感知，以应对跟踪过程中复杂多变的环境；采用Anchor-free的状态估计策略实现对目标的精准估计。实验结果表明：所提算法SiamCC在数据集OTB100、VOT2016和VOT2018上的测试结果均好于DaSiamRPN、ATOM等算法，并且跟踪速度达到了85帧/s。
- 机器视觉 /
- 目标跟踪 /
- 孪生网络 /
- 通道注意力 /
- 上下文感知
Abstract:
The matching-based Siamese network algorithm often lacks the overall perception of a target, which easily leads to inaccurate target state estimation and target missing in complex environments. Therefore, this paper designs two lightweight modules on the basis of the twin network to achieve more accurate and robust target tracking. An efficient channel attention module is embedded into the backbone network after its construction for feature extraction. Efficient extraction of target features and enhanced differential representation are achieved. so that the network pays more attention to the target information. The features after template matching pass a local context awareness module, thus enhancing the network's overall perception of the target to deal with the complex and changeable environment in the tracking process. The Anchor-free state estimation strategy is used to achieve accurate estimation of the target. Experimental results show that on the datasets OTB100, VOT2016 and VOT2018, SiamCC algorithm outperforms DaSiamRPN algorithms and ATOM algorithm, with the tracking speed reaching 85 frame/s.
- machine vision /
- target tracking /
- Siamese network /
- channel attention /
- context awareness

HTML全文

图 1 基于高效注意力和上下文感知的孪生网络目标跟踪算法框架

Figure 1. Framework diagram of target tracking algorithm using Siamese network based on efficient attention and context awareness

下载: 全尺寸图片幻灯片

图 2 高效通道注意力模块

Figure 2. Efficient channel attention module

下载: 全尺寸图片幻灯片

图 3 局部上下文感知模块

Figure 3. Local context awareness module

下载: 全尺寸图片幻灯片

图 4 不同算法在Gym视频上的测试结果

Figure 4. Test results of different algorithms on Gym video

下载: 全尺寸图片幻灯片

图 5 正负样本区域的分配方式示例

Figure 5. Example of allocation of positive and negative sample areas

下载: 全尺寸图片幻灯片

图 6 不同算法在数据集OTB100上的精确率与成功率对比

Figure 6. Comparison of accuracy and success rates of different algorithms on the OTB100 datasets

下载: 全尺寸图片幻灯片

图 7 不同算法在11类挑战下的精确率对比

Figure 7. Comparison of accuracy rates of different algorithms under 11 types of challenges

下载: 全尺寸图片幻灯片

图 8 不同算法在数据集VOT2016上期望平均重叠率对比

Figure 8. Comparison of EAO of different algorithms on the VOT2016 datasets

下载: 全尺寸图片幻灯片

图 9 不同算法在数据集VOT2018上期望平均重叠率对比

Figure 9. Comparison of EAO of different algorithms on the VOT2018 datasets

下载: 全尺寸图片幻灯片

图 10 不同算法在数据集VOT2018上的性能与速度对比

Figure 10. Comparison of performance and speed of different algorithms on VOT2018 dataset

下载: 全尺寸图片幻灯片

图 11 各类算法在不同视频下的跟踪结果

Figure 11. Display of tracking results of various algorithms with different videos

下载: 全尺寸图片幻灯片

表 1 不同算法在数据集VOT2016上的测试结果对比

Table 1. Comparison of test results of different algorithms on the VOT2016 datasets

算法	准确性	鲁棒性	EAO
SiamCC(本文)	0.61	0.16	0.448
SPM	0.62	0.21	0.434
DaSiamRPN	0.61	0.22	0.411
ECO	0.55	0.20	0.375
C-RPN	0.59	0.27	0.363
SiamRPN	0.56	0.26	0.344
STRCF	0.55		0.313
SASiam	0.54	0.34	0.291
MDNet	0.51	0.25	0.283
SiamFC	0.53	0.46	0.235

下载: 导出CSV

表 2 不同算法在数据集VOT2018上的测试结果对比

Table 2. Comparison of test results of different algorithms on the VOT2018 datasets

算法	准确性	鲁棒性	EAO
SiamCC(本文)	0.58	0.19	0.405
SiamRPN++	0.60	0.23	0.414
ATOM	0.59	0.20	0.401
SiamMask	0.61	0.28	0.380
SPM	0.58	0.30	0.338
DaSiamRPN	0.56	0.34	0.326
ECO	0.48	0.27	0.280
GradNet	0.51	0.38	0.247
SiamRPN	0.49	0.46	0.244
SASiam	0.50	0.46	0.236

下载: 导出CSV

表 3 本文算法在数据集OTB2013和VOT2016上的消融实验

Table 3. Ablation experiment of the proposed algorithm on OTB2013 and VOT2016 datasets

模块	模块	模块	模块	OTB2013成功率	VOT2016 EAO	速度/(帧·s^-1)
				0.661	0.417	93
ECANet				0.665	0.420	91
	CCNet			0.667	0.422	88
		ECAM		0.669	0.426	91
			LCAM	0.676	0.431	88
ECANet	CCNet			0.672	0.429	85
		ECAM	LCAM	0.684	0.448	85

下载: 导出CSV

参考文献(34)

[1]	孟琭, 杨旭. 目标跟踪算法综述[J]. 自动化学报, 2019, 45(7): 1244-1260. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201907003.htm MENG L, YANG X. A survey of object tracking algorithms[J]. Acta Automatica Sinica, 2019, 45(7): 1244-1260(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201907003.htm
[2]	BOLME D S, BEVERIDEG J R, DRAPER B A, et al. Visual object tracking using adaptive correlation filters[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2010: 2544-2550.
[3]	HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596. doi: 10.1109/TPAMI.2014.2345390
[4]	DANELLJAN M, BHAT G, SHAHBAZ K F, et al. ECO: Efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6638-6646.
[5]	李玺, 查宇飞, 张天柱, 等. 基于深度学习的目标跟踪算法发展综述[J]. 中国图象图形学报, 2019, 24(12): 2057-2080. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB201912001.htm LI X, ZHA Y F, ZHANG T Z, et al. A survey of visual object tracking algorithms based on deep learning[J]. Journal of Image and Graphics, 2019, 24(12): 2057-2080(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB201912001.htm
[6]	WANG N, YEUNG D Y. Learning a deep compact image representation for visual tracking[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. New York: ACM, 2013: 809-817.
[7]	MA C, HUANG J B, YANG X, et al. Hierarchical convolutional features for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 3074-3082.
[8]	BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 850-865.
[9]	HE A, LUO C, TIAN X, et al. A twofold Siamese network for real-time object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4834-4843.
[10]	LI B, YAN J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980.
[11]	FAN H, LING H. Siamese cascaded region proposal networks for realtime visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7952-7961.
[12]	SHEN J, TANG X, DONG X, et al. Visual object tracking by hierarchical attention Siamese network[J]. IEEE Transactions on Cybernetics, 2019, 50(7): 3068-3080.
[13]	ZHANG Z, PENG H. Deeper and wider Siamese networks for real-time visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4591-4600.
[14]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
[15]	KEIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems, 2012: 1097-1105.
[16]	WANG Q, TENG Z, XING J, et al. Learning attentions: Residual attentional Siamese network for high performance online visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4854-4863.
[17]	WANG F, JIANG M, QIAN C, et al. Residual attention network for image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 3156-3164.
[18]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141.
[19]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 3-19.
[20]	WANG Q, WU B, ZHU P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 11534-11542.
[21]	WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803.
[22]	HUANG Z, WANG X, HUANG L, et al. CCNet: Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 603-612.
[23]	KONG T, SUN F, LIU H, et al. FoveaBox: Beyound anchor-based object detection[J]. IEEE Transactions on Image Processing, 2020, 29: 7389-7398. doi: 10.1109/TIP.2020.3002345
[24]	LI B, WU W, WANG Q, et al. SiamRPN++: Evolution of Siamese visual tracking with very deep networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4282-4291.
[25]	ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 101-117.
[26]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2980-2988.
[27]	HUANG L, ZHAO X, HUANG K. GOT-10k: A large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1562-1577.
[28]	FAN H, LIN L, YANG F, et al. LaSOT: A high-quality benchmark for large-scale single object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 5374-5383.
[29]	WU Y, LIM J, YANG M H. Online object tracking: A benchmark[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2013: 2411-2418.
[30]	DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: Accurate tracking by overlap maximization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4660-4669.
[31]	LI P, CHEN B, OUYANG W, et al. GradNet: Gradient-guided network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 6162-6171.
[32]	GAO J, ZHANG T, XU C. Graph convolutional tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4649-4659.
[33]	WANG G, LUO C, XIONG Z, et al. SPM-tracker: Series-parallel matching for real-time visual object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 3643-3652.
[34]	WANG Q, ZHANG L, BERTINETTO L, et al. Fast online object tracking and segmentation: A unifying approach[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1328-1338.