结合空间注意力机制的实时鲁棒视觉跟踪

马素刚; 张子贤; 蒲磊; 侯志强

doi:10.13700/j.bh.1001-5965.2022.0329

结合空间注意力机制的实时鲁棒视觉跟踪

doi: 10.13700/j.bh.1001-5965.2022.0329

马素刚^{1, 2, ,},
张子贤^1,,
蒲磊³,
侯志强^{1, 4}

1.
西安邮电大学计算机学院，西安 710121
2.
长安大学信息工程学院，西安 710064
3.
火箭军工程大学作战保障学院，西安 710025
4.
西安邮电大学陕西省网络数据分析与智能处理重点实验室，西安 710121

基金项目: 国家自然科学基金(62072370)；陕西省重点研发计划(2018ZDCXL-GY-04-02)；西安邮电大学研究生创新基金(CXJJZL2021011)

详细信息

通讯作者:
E-mail：msg@xupt.edu.cn

中图分类号: TB391.4
计量
- 文章访问数: 574
- HTML全文浏览量: 85
- PDF下载量: 17
- 被引次数: 0
出版历程
- 收稿日期: 2022-05-07
- 录用日期: 2022-08-21
- 网络出版日期: 2022-09-14
- 整期出版日期: 2024-02-27

Real-time robust visual tracking based on spatial attention mechanism

MA Sugang^{1, 2
, ,},
ZHANG Zixian^1
,,
PU Lei³,
HOU Zhiqiang^{1, 4}

1.
School of Computer Science and Technology，Xi’an University of Posts and Telecommunications，Xi’an 710121，China
2.
School of Information Engineering，Chang’ an University，Xi’an 710064，China
3.
School of Operational Support，Rocket Force Engineering University，Xi’an 710025，China
4.
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing，Xi’an University of Posts and Telecommunications，Xi’an 710121，China

Funds: National Natural Science Foundation of China (62072370); Key Research and Developement program of Shaanxi (2018ZDCXL-GY-04-02); Graduate Innovation Fund of Xi’an University of Posts and Telecommunications (CXJJZL2021011)

More Information

Corresponding author: E-mail：msg@xupt.edu.cn

摘要

摘要:
为提高全卷积孪生网络(SiamFC)跟踪器在复杂场景下的跟踪能力，缓解跟踪器在跟踪过程中出现的目标漂移问题，提出一种结合空间注意力机制的实时目标跟踪算法。在SiamFC框架基础上，将改进的视觉几何组(VGG)网络作为主干网络，增强跟踪器对于目标深度特征的建模能力。对自注意力机制进行优化，提出一种即插即用的轻量级单卷积注意力模块(SCAM)，将空间注意力分解为2个并行的一维特征编码过程，减少空间注意力的计算复杂度。保留跟踪过程中的初始目标模板作为第1模板，通过分析连通域在跟踪结果响应图的变化动态选择第2模板，融合2个模板后对目标进行定位。实验结果表明：在OTB100、LaSOT和UAV123数据集上，所提算法相比于SiamFC跟踪成功率分别提高了0.082、0.045和0.045，跟踪精度分别提高了0.118、0.051和0.062；在VOT2018数据集上，所提算法相比于SiamFC在跟踪准确率、鲁棒性和期望平均重叠率上分别提高了0.029、0.276和0.134；跟踪速度达到了70帧/s，能够满足实时跟踪的需求。
- 目标跟踪 /
- 孪生网络 /
- 注意力机制 /
- 模板更新 /
- 深度学习
Abstract:
A real-time object tracking method coupled with a spatial attention mechanism is suggested in order to enhance the fully convolutional Siamese network (SiamFC) tracker’s tracking capability in complex settings and alleviate the target drift problem in the tracking process. The improved visual geometry group (VGG) network is used as the backbone network to enhance the modeling ability of the tracker for the target deep feature. The self-attention mechanism is optimized, and a lightweight single convolution attention module (SCAM) is proposed. The spatial attention is decomposed into two parallel one-dimensional feature coding processes to reduce the computational complexity of spatial attention. The initial target template in the tracking process is retained as the first template, and the second template is dynamically selected by analyzing the variation of the connected domain in the tracking response map. The target is located after fusing the two templates. The experimental results show that, compared with SiamFC, the success rate of the proposed algorithm on OTB100, LaSOT, and UAV123 datasets is increased respectively by 0.082, 0.045, and 0.045, and the tracking accuracy by 0.118,0.051, and 0.062. On the VOT2018 dataset, the proposed algorithm improves the tracking accuracy, robustness, and expected average overlap by 0.029, 0.276, and 0.134, respectively, compared with SiamFC. Real-time tracking requirements can be satisfied by the tracking speed, which can approach 70 frames per second.
- object tracking /
- Siamese network /
- attention mechanism /
- model update /
- deep learning

HTML全文

图 1 本文算法的总体框架

Figure 1. Overall framework of proposed algorithm

下载: 全尺寸图片幻灯片

图 2 简化的非局部结构

Figure 2. Simplified structure of non-local module

下载: 全尺寸图片幻灯片

图 3 SCAM 结构

Figure 3. Structure of SCAM

下载: 全尺寸图片幻灯片

图 4 热力图可视化

Figure 4. Visualization of heatmap

下载: 全尺寸图片幻灯片

图 5 SiamFC 上响应峰值和 APCE 值的变化

Figure 5. Variation of response peak value and APCE value on SiamFC

下载: 全尺寸图片幻灯片

图 6 不同模板的跟踪响应图

Figure 6. Tracking response map of different templates

下载: 全尺寸图片幻灯片

图 7 跟踪响应图分割结果

Figure 7. Segmentation results of tracking response map

下载: 全尺寸图片幻灯片

图 8 OTB100数据集上部分跟踪结果比较

Figure 8. Comparison of some tracking results

下载: 全尺寸图片幻灯片

图 9 不同算法在 OTB100 数据集上的评估结果

Figure 9. Evaluation results of different algorithms on OTB100 dataset

下载: 全尺寸图片幻灯片

图 10 不同算法在 OTB100 数据集上的速度对比

Figure 10. Speed comparison of different algorithms on OTB100 different

下载: 全尺寸图片幻灯片

图 11 OTB100数据集上不同挑战属性的跟踪成功率

Figure 11. Tracking success rate of different challenge attributes on OTB100 dateset

下载: 全尺寸图片幻灯片

图 12 不同算法在 LaSOT数据集上的评估结果

Figure 12. Evaluation results of different algorithms on LaSOT dataset

下载: 全尺寸图片幻灯片

图 13 LaSOT 数据集上不同挑战属性跟踪成功率对比

Figure 13. Comparison of tacking success rates of different challenge attributes on LaSOT dataset

下载: 全尺寸图片幻灯片

图 14 不同算法在 UAV123 数据集上的评估结果

Figure 14. Evaluation results of different algorithms on UAV123 dataset

下载: 全尺寸图片幻灯片

图 15 SCAM 与非局部模块对比实验结果

Figure 15. Comparative experimental results of SCAM and non-local modules

下载: 全尺寸图片幻灯片

表 1 在OTB100数据集上$\alpha $取不同值时的实验结果

Table 1. Experimental result of different $\alpha $ values on OTB100 dataset

$\alpha $	成功率	精度
0.50	0.655	0.870
0.55	0.669	0.890
0.60	0.651	0.869
0.65	0.658	0.878
0.70	0.659	0.876
0.75	0.660	0.870
0.80	0.655	0.875
0.85	0.659	0.875
0.90	0.659	0.880
0.95	0.653	0.866

下载: 导出CSV

表 2 不同算法在VOT2018数据集上的实验结果

Table 2. Experimental results on of different algonithms VOT2018 dataset

算法	准确率	鲁棒性	EAO
SiamFC^[13]	0.503	0.585	0.188
DSiam^[36]	0.512	0.646	0.196
UpdateNet^[18]	0.518	0.454	0.244
CCOT^[37]	0.494	0.318	0.267
ECO^[27]	0.484	0.276	0.280
SiamVGG^[38]	0.531	0.318	0.286
DeepCSRDCF^[39]	0.489	0.276	0.293
本文算法	0.532	0.309	0.322

下载: 导出CSV

表 3 消融实验结果

Table 3. result of ablation study

实验	成功率	精度
SiamFC	0.587	0.772
SiamFC-V	0.624	0.833
SiamFC-V-A	0.657	0.875
SiamFC-V-A-U	0.669	0.890

下载: 导出CSV

参考文献(44)

[1]	GAO M, JIN L S, JIANG Y Y, et al. Manifold Siamese network: A novel visual tracking ConvNet for autonomous vehicles[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(4): 1612-1623. doi: 10.1109/TITS.2019.2930337
[2]	寇展, 吴健发, 王宏伦, 等. 基于深度学习的低空小型无人机障碍物视觉感知[J]. 中国科学:信息科学, 2020, 50(5): 692-703. doi: 10.1360/N112019-00034 KOU Z, WU J F, WANG H L, et al. Obstacle visual sensing based on deep learning for low-altitude small unmanned aerial vehicles[J]. Scientia Sinica (Informationis), 2020, 50(5): 692-703(in Chinese). doi: 10.1360/N112019-00034
[3]	MARVASTI-ZADEH S M, CHENG L, GHANEI-YAKHDAN H, et al. Deep learning for visual tracking: A comprehensive survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(5): 3943-3968. doi: 10.1109/TITS.2020.3046478
[4]	LUO J H, HAN Y, FAN L Y. Underwater acoustic target tracking: A review[J]. Sensors, 2018, 18(2): 112. doi: 10.3390/s18010112
[5]	刘芳, 孙亚楠, 王洪娟, 等. 基于残差学习的自适应无人机目标跟踪算法[J]. 北京航空航天大学学报, 2020, 46(10): 1874-1882. doi: 10.13700/j.bh.1001-5965.2019.0551 LIU F, SUN Y N, WANG H J, et al. Adaptive UAV target tracking algorithm based on residual learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(10): 1874-1882 (in Chinese). doi: 10.13700/j.bh.1001-5965.2019.0551
[6]	韩明, 王景芹, 王敬涛, 等. 基于孪生网络的目标跟踪研究综述[J]. 河北科技大学学报, 2022, 43(1): 27-41. HAN M, WANG J Q, WANG J T, et al. Comprehensive survey on target tracking based on Siamese network[J]. Journal of Hebei University of Science and Technology, 2022, 43(1): 27-41(in Chinese).
[7]	HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596. doi: 10.1109/TPAMI.2014.2345390
[8]	ZHANG K H, ZHANG L, LIU Q S, et al. Fast visual tracking via dense spatio-temporal context learning[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2014: 127-141.
[9]	DANELLJAN M, KHAN F S, FELSBERG M, et al. Adaptive color attributes for real-time visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 1090-1097.
[10]	MA C, HUANG J B, YANG X K, et al. Hierarchical convolutional features for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 3074-3082.
[11]	WANG L J, OUYANG W L, WANG X G, et al. Visual tracking with fully convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 3119-3127.
[12]	BHAT G, JOHNANDER J, DANELLJAN M, et al. Unveiling the power of deep tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 493-509.
[13]	BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//Eurpean Conference on computer Vision. Berlin: Springer, 2016: 850-865.
[14]	HE A F, LUO C, TIAN X M, et al. A twofold Siamese network for real-time object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4834-4843.
[15]	PU L, FENG X X, HOU Z Q, et al. SiamDA: Dual attention Siamese network for real-time visual tracking[J]. Signal Processing:Image Communication, 2021, 95: 116293. doi: 10.1016/j.image.2021.116293
[16]	GUPTA D K, ARYA D, GAVVES E. Rotation equivariant Siamese networks for tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 12357-12366.
[17]	VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 5000-5008.
[18]	ZHANG L C, GONZALEZ-GARCIA A, VAN DE WEIJER J, et al. Learning the model update for Siamese trackers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 4009-4018.
[19]	ZHU Z, WU W, ZOU W, et al. End-to-end flow correlation tracking with spatial-temporal attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 548-557.
[20]	ZHANG C Y, WANG H, WEN J W, et al. Deeper Siamese network with stronger feature representation for visual tracking[J]. IEEE Access, 2020, 8: 119094-119104. doi: 10.1109/ACCESS.2020.3005511
[21]	WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803.
[22]	CAO Y, XU J R, LIN S, et al. GCNet: Non-local networks meet squeeze-excitation networks and beyond[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision . Piscataway: IEEE Press, 2020: 1971-1980.
[23]	WANG M M, LIU Y, HUANG Z Y. Large margin object tracking with circulant feature maps[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 4800-4808.
[24]	DANELLJAN M, HÄGER G, KHAN F S, et al. Discriminative scale space tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(8): 1561-1575. doi: 10.1109/TPAMI.2016.2609928
[25]	BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: Complementary learners for real-time tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1401-1409.
[26]	DANELLJAN M, HÄGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 4310-4318.
[27]	DANELLJAN M, BHAT G, KHAN F S, et al. ECO: Efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6931-6939.
[28]	DANELLJAN M, HAGER G, KHAN F S, et al. Convolutional features for correlation filter based visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision . Piscataway: IEEE Press, 2015: 58-66.
[29]	ZHANG Z P, PENG H W. Deeper and wider Siamese networks for real-time visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4586-4595.
[30]	LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980.
[31]	BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning discriminative model prediction for tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 6181-6190.
[32]	ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 103-119.
[33]	DANELLJAN M, VAN GOOL L, TIMOFTE R. Probabilistic regression for visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 7181-7190.
[34]	DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: Accurate tracking by overlap maximization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4655-4664.
[35]	LI P X, CHEN B Y, OUYANG W L, et al. GradNet: Gradient-guided network for visual object tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 6161-6170.
[36]	GUO Q, FENG W, ZHOU C, et al. Learning dynamic Siamese network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 1781-1789.
[37]	DANELLJAN M, ROBINSON A, KHAN F S, et al. Beyond correlation filters: Learning continuous convolution operators for visual tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 472-488.
[38]	YIN Z J, WEN C H, HUANG Z Y, et al. SiamVGG-LLC: Visual tracking using LLC and deeper Siamese networks[C]//Proceedings of the IEEE International Conference on Communication Technology. Piscataway: IEEE Press, 2020: 1683-1687.
[39]	LUKEŽIC A, VOJÍR T, ZAJC L C, et al. Discriminative correlation filter with channel and spatial reliability[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 4847-4856.
[40]	XU T Y, FENG Z H, WU X J, et al. Joint group feature selection and discriminative filter learning for robust visual object tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 7949-7959.
[41]	DAI K N, WANG D, LU H C, et al. Visual tracking via adaptive spatially-regularized correlation filters[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4665-4674.
[42]	ZHANG Y H, WANG L J, QI J Q, et al. Structured Siamese network for real-time visual tracking[C]//Proceedings of the Enropean conference on Computer Vision. Berlin: Springer, 2018: 355-370.
[43]	LI F, TIAN C, ZUO W M, et al. Learning spatial-temporal regularized correlation filters for visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4904-4913.
[44]	CHOI J, CHANG H J, FISCHER T, et al. Context-aware deep feature compression for high-speed visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 479-488.