基于二阶注意力的Siamese网络视觉跟踪算法

侯志强; 陈茂林; 马靖媛; 郭凡; 余旺盛; 马素刚

doi:10.13700/j.bh.1001-5965.2022.0373

基于二阶注意力的Siamese网络视觉跟踪算法

doi: 10.13700/j.bh.1001-5965.2022.0373

侯志强^{1, 2, ,},
陈茂林^{1, 2},
马靖媛^{1, 2},
郭凡^{1, 2},
余旺盛³,
马素刚^{1, 2}

1.
西安邮电大学计算机学院，西安 710121
2.
西安邮电大学陕西省网络数据分析与智能处理重点实验室，西安 710121
3.
空军工程大学信息与导航学院，西安 710077

基金项目: 国家自然科学基金(62072370)

详细信息

通讯作者:
E-mail：hzq@xupt.edu.cn

中图分类号: TP391.4
计量
- 文章访问数: 404
- HTML全文浏览量: 97
- PDF下载量: 38
- 被引次数: 0
出版历程
- 收稿日期: 2022-05-17
- 录用日期: 2022-06-27
- 网络出版日期: 2022-09-16
- 整期出版日期: 2024-03-27

Siamese network visual tracking algorithm based on second-order attention

HOU Zhiqiang^{1, 2
, ,},
CHEN Maolin^{1, 2},
MA Jingyuan^{1, 2},
GUO Fan^{1, 2},
YU Wangsheng³,
MA Sugang^{1, 2}

1.
School of Computer Science and Technology，Xi’an University of Posts and Telecommunications，Xi’an 710121，China
2.
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing，Xi’an University of Posts and Telecommunications，Xi’an 710121，China
3.
School of Information and Navigation，Air Force Engineering University，Xi’an 710077，China

Funds: National Natural Science Foundation of China (62072370)

More Information

Corresponding author: E-mail：hzq@xupt.edu.cn

摘要

摘要:
为提升基于Siamese网络视觉跟踪算法的特征表达能力和判别能力，以获得更好的跟踪性能，提出了一种轻量级的基于二阶注意力的Siamese网络视觉跟踪算法。使用轻量级VGG-Net作为Siamese网络的主干，获取目标的深度特征；在Siamese网络的末端并行使用所提残差二阶池化网络和二阶空间注意力网络，获取具有通道相关性的二阶注意力特征和具有空间相关性的二阶注意力特征；使用残差二阶通道注意力特征和二阶空间注意力特征，通过双分支响应策略实现视觉跟踪。利用GOT-10k数据集对所提算法进行端到端的训练，并在OTB100和VOT2018数据集上进行验证。实验结果表明：所提算法的跟踪性能取得了显著提升，与基准算法SiamFC相比，在OTB100数据集上，精度和成功率分别提高了0.100和0.096，在VOT2018数据集上，预期平均重叠率(EAO)提高了0.077，跟踪速度达到了48帧/s。
- Siamese网络 /
- 视觉跟踪 /
- 残差二阶池化网络 /
- 二阶空间注意力网络 /
- 双分支响应策略
Abstract:
To improve the feature expression ability and discriminative ability of the visual tracking algorithm based on Siamese network and obtain better tracking performance, a lightweight Siamese network visual tracking algorithm based on second-order attention is proposed. Firstly, to obtain deep features of the object, the lightweight VGG-Net is used as the backbone of the Siamese network.Secondly, the residual second-order pooling network and the second-order spatial attention network are used in parallel at the end of the Siamese network to obtain the second-order attention features with channel correlation and the second-order attention features with spatial correlation.Finally, visual tracking is achieved through a double branch response strategy using the residual second-order channel attention features and the second-order spatial attention features. The proposed algorithm is trained end-to-end with the GOT-10k dataset and validated on the datasets OTB100 and VOT2018.The experimental results show that the tracking performance of the proposed algorithm has been significantly improved. Compared with the baseline algorithm SiamFC, on dataset OTB100, the precision and the success are increased by 0.100 and 0.096, respectively; on dataset VOT2018, the expected average overlap (EAO) increased by 0.077, tracking speed reached 48 frame/s.
- Siamese network /
- visual tracking /
- residual second-order pooling network /
- second-order spatial attention network /
- double branch response strategy

HTML全文

图 1 本文算法总体框架

Figure 1. Overall framework of proposed algorithm

下载: 全尺寸图片幻灯片

图 2 残差二阶池化网络

Figure 2. Residual second-order pooling network

下载: 全尺寸图片幻灯片

图 3 二阶池化流程

Figure 3. Second-order pooling process

下载: 全尺寸图片幻灯片

图 4 二阶空间注意力网络

Figure 4. Second-order spatial attention network

下载: 全尺寸图片幻灯片

图 5 部分视频序列跟踪结果

Figure 5. Tracking results of partial video sequence

下载: 全尺寸图片幻灯片

图 6 OTB100数据集的定量对比结果

Figure 6. Quantitative comparison results of OTB100 dataset

下载: 全尺寸图片幻灯片

表 1 归一化参数对跟踪性能的影响

Table 1. Influence of normalization parameters on tracking performance

C	精度	成功率
0.5	0.806	0.597
0.6	0.815	0.606
0.7	0.818	0.609
0.8	0.825	0.614
0.9	0.821	0.612
1.0	0.831	0.621
2.0	0.843	0.629
3.0	0.844	0.626
4.0	0.830	0.623
5.0	0.849	0.633
6.0	0.845	0.638
7.0	0.836	0.626
8.0	0.846	0.633
9.0	0.824	0.617
10.0	0.830	0.621

下载: 导出CSV

表 2 细化归一化参数平衡跟踪性能

Table 2. Refinement of normalization parameters to balance tracking performance

C	精度	成功率
5.0	0.849	0.633
5.1	0.843	0.629
5.2	0.832	0.629
5.3	0.839	0.630
5.4	0.842	0.629
5.5	0.848	0.638
5.6	0.836	0.625
5.7	0.836	0.622
5.8	0.829	0.628
5.9	0.844	0.626
6.0	0.845	0.638

下载: 导出CSV

表 3 消融实验结果

Table 3. Ablation experiment results

SiamFC	VGG-Net	SoA	ResSoP	DBR	精度	成功率	跟踪速度/(帧·s⁻¹)
√					0.777	0.580	58
√	√				0.828	0.622	55
√	√	√			0.845	0.638	53
√	√		√		0.864	0.649	50
√	√	√	√	√	0.877	0.676	48

下载: 导出CSV

表 4 不同属性下算法的跟踪成功率对比结果

Table 4. Comparison results of tracking success rate of algorithms under different attributes

算法	光照变化	平面外旋转	尺度变化	离开视野	目标形变	低分辨率	快速运动	目标遮挡	相似背景	平面内旋转	运动模糊
本文算法	0.681	0.658	0.668	0.607	0.640	0.698	0.646	0.634	0.640	0.649	0.674
SiamSE^[26]	0.670	0.635	0.678	0.613	0.617	0.697	0.663	0.637	0.620	0.651	0.651
ATOM^[25]	0.679	0.643	0.681	0.612	0.630	0.693	0.662	0.648	0.631	0.650	0.658
SiamDW-FC^[18]	0.627	0.617	0.618	0.595	0.562	0.616	0.632	0.606	0.582	0.613	0.655
TADT^[24]	0.674	0.643	0.650	0.623	0.602	0.644	0.655	0.638	0.619	0.618	0.668
GradNet^[23]	0.643	0.628	0.614	0.583	0.572	0.669	0.624	0.616	0.611	0.627	0.646
DaSiamRPN^[8]	0.655	0.644	0.637	0.537	0.645	0.636	0.621	0.611	0.642	0.652	0.625
SASiam^[22]	0.644	0.642	0.642	0.611	0.590	0.692	0.636	0.629	0.634	0.626	0.651
SiamRPN^[7]	0.652	0.628	0.621	0.548	0.619	0.663	0.603	0.589	0.598	0.632	0.624
SiamFC^[6]	0.536	0.550	0.560	0.473	0.546	0.574	0.575	0.532	0.525	0.567	0.587

下载: 导出CSV

表 5 VOT2018数据集实验对比结果

Table 5. VOT2018 dataset experimental comparison results

算法	EAO	准确率	鲁棒性
本文算法	0.265	0.534	0.389
SiamSE^[26]	0.270	0.538	0.432
GradNet^[23]	0.247	0.510	0.390
SiamDW-FC^[18]	0.230	0.500	0.490
SiamRPN^[7]	0.243	0.560	0.340
SASiam^[22]	0.236	0.500	0.459
SiamFC-tri^[29]	0.212	0.483	0.526
DSiam^[28]	0.196	0.512	0.646
DCFNet^[27]	0.182	0.470	0.543
SiamFC^[6]	0.188	0.500	0.590

下载: 导出CSV

参考文献(29)

[1]	MARVASTI-ZADEH S M, CHENG L, GHANEI-YAKHDAN H. Deep learning for visual tracking: A comprehensive survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(5): 3943-3968. doi: 10.1109/TITS.2020.3046478
[2]	柏罗, 张宏立, 王聪. 基于高效注意力和上下文感知的目标跟踪算法[J]. 北京航空航天大学学报, 2022, 48(7): 1222-1232. BAI L, ZHANG H L, WANG C. Target tracking algorithm based on efficient attention and context awareness[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(7): 1222-1232(in Chinese).
[3]	李玺, 查宇飞, 张天柱, 等. 深度学习的视觉跟踪算法综述[J]. 中国图象图形学报, 2019, 24(12): 2057-2080. doi: 10.11834/jig.190372 LI X, ZHA Y F, ZHANG T Z, et al. A survey of visual object tracking algorithms based on deep learning[J]. Journal of Image and Graphics, 2019, 24(12): 2057-2080(in Chinese). doi: 10.11834/jig.190372
[4]	蒲磊, 李海龙, 侯志强, 等. 基于高层语义嵌入的孪生网络跟踪算法[J]. 北京航空航天大学学报, 2023, 49(4): 792-803. PU L, LI H L, HOU Z Q, et al. Siamese network tracking based on high level semantic embedding[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(4): 792-803(in Chinese).
[5]	张成煜, 侯志强, 蒲磊, 等. 基于在线学习的Siamese网络视觉跟踪算法[J]. 光电工程, 2021, 48(4): 4-14. ZHANG C Y, HOU Z Q, PU L, et al. Siamese network visual tracking algorithm based on online learning[J]. Opto-Electronic Engineering, 2021, 48(4): 4-14(in Chinese).
[6]	BERTINETTO L, VALMADRE J, HENRIQUES J F. Fully-convolutional Siamese networks for object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 850-865.
[7]	LI B, YAN J, WU W. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980.
[8]	ZHU Z, WANG Q, LI B. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 101-117.
[9]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. doi: 10.1145/3065386
[10]	WANG Q, ZHANG L, WU B. What deep CNNs benefit from global covariance pooling: An optimization perspective[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 10771-10780.
[11]	WANG Q, XIE J, ZUO W. Deep CNNs meet global covariance pooling: Better representation and generalization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(8): 2582-2597.
[12]	GAO Z, WANG Q, ZHANG B, et al. Temporal-attentive covariance pooling networks for video recognition[C]//Proceedings of the Advances in Neural Information Processing Systems. [S. l. ]: NeurIPS, 2021.
[13]	蒲磊, 冯新喜, 侯志强, 等. 基于二阶池化网络的鲁棒视觉跟踪算法[J]. 电子学报, 2020, 48(8): 1472-1478. PU L, FENG X X, HOU Z Q, et al. Robust visual tracking based on second order pooling network[J]. Acta Electronica Sinica, 2020, 48(8): 1472-1478(in Chinese).
[14]	LI Y, ZHANG X, CHEN D M. SiamVGG: Visual tracking using deeper Siamese networks[EB/OL]. (2019-02-07)[2022-05-01]. https://arxiv.org/abs/1902.02804.
[15]	GAO Z, XIE J, WANG Q. Global second-order pooling convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 3024-3033.
[16]	NG T, BALNTAS V, TIAN Y. SOLAR: Second-order loss and attention for image retrieval[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 253-270.
[17]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2022-05-01]. https://arxiv.org/abs/1409.1556.
[18]	ZHANG Z, PENG H. Deeper and wider Siamese networks for real-time visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4591-4600.
[19]	HUANG L, ZHAO X, HUANG K. GOT-10k: A large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1562-1577.
[20]	WU Y, LIM J, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834-1848.
[21]	KRISTAN M, LEONARDIS A, MATAS J. The sixth visual object tracking VOT2018 challenge results[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 3-53.
[22]	HE A, LUO C, TIAN X. A twofold Siamese network for real-time object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4834-4843.
[23]	LI P, CHEN B, OUYANG W, et al. GradNet: Gradient-guided network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 6162-6171.
[24]	LI X, MA C, WU B. Target-aware deep tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1369-1378.
[25]	DANELLJAN M, BHAT G, KHAN F S. ATOM: Accurate tracking by overlap maximization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4660-4669.
[26]	SOSNOVIK I, MOSKALEV A, SMEULDERSA W M. Scale equivariance improves Siamese tracking[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2021: 2765-2774.
[27]	WANG Q, GAO J, XING J. DCFNet: Discriminant correlation filters network for visual tracking[EB/OL]. (2017-04-13)[2022-05-01]. https://arxiv.org/abs/1704.04057.
[28]	GUO Q, FENG W, ZHOU C. Learning dynamic Siamese network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 1763-1771.
[29]	DONG X, SHEN J. Triplet loss in Siamese network for object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 459-474.