基于模板更新和双特征增强的视觉跟踪算法

丁奇帅; 雷帮军; 牟乾西; 吴正平

doi:10.13700/j.bh.1001-5965.2024.0020

基于模板更新和双特征增强的视觉跟踪算法

doi: 10.13700/j.bh.1001-5965.2024.0020

丁奇帅^{1, 2, 3},
雷帮军^{1, 2, 3, ,},
牟乾西^{1, 2, 3},
吴正平^{1, 2, 3}

1.
水电工程智能视觉监测湖北省重点实验室，宜昌 443002
2.
三峡大学计算机与信息学院，宜昌 443002
3.
水电工程视觉监测宜昌市重点实验室，宜昌 443002

基金项目:

国家自然科学基金(61871258) ；水电工程智能视觉监测湖北省重点实验室建设项目(2019ZYYD007)；宜昌市科技研究与开发项目(A201130225)

详细信息

通讯作者:
E-mail：bangjun.lei@ieee.org

中图分类号: TP391.4
计量
- 文章访问数: 362
- HTML全文浏览量: 172
- PDF下载量: 12
- 被引次数: 0
出版历程
- 收稿日期: 2024-01-11
- 录用日期: 2024-01-28
- 网络出版日期: 2024-02-27
- 整期出版日期: 2026-04-30

Visual tracking algorithm based on template updating and dual feature enhancement

DING Qishuai^{1, 2, 3},
LEI Bangjun^{1, 2, 3
, ,},
MOU Qianxi^{1, 2, 3},
WU Zhengping^{1, 2, 3}

1.
Hubei Key Laboratory of Intelligent Visual Monitoring for Hydropower Engineering，Yichang 443002，China
2.
College of Computer and Information Technology，China Three Gorges University，Yichang 443002，China
3.
Yichang Key Laboratory of Hydropower Engineering Vision Supervision，Yichang 443002，China

Funds:

National Natural Science Foundation of China (61871258); Hubei Key Laboratory of Intelligent Visual Monitoring for Hydropower Engineering Project (2019ZYYD007); Yichang Science and Technology Research and Development Progect (A201130225)

More Information

Corresponding author: E-mail：bangjun.lei@ieee.org

摘要

摘要:
针对视觉跟踪中由于目标形变、翻转和遮挡而导致的跟踪失败问题，提出了一种基于图像结构相似性的模板更新算法，通过动态更新模板以适应目标在跟踪过程中的变化。同时，基于SiamMask网络设计了跟踪特征增强模块和分割特征增强模块。跟踪特征增强模块包括非局部操作和卷积下采样，用于建立上下文关联，增强目标特征，抑制背景干扰，提高跟踪鲁棒性，解决由于目标被遮挡而导致的特征减弱问题。分割特征增强模块引入卷积块注意力模块和可变形卷积，以提高网络对通道和空间特征的捕捉能力，自适应地学习目标的形状和轮廓信息，提升网络对跟踪目标的分割精度，进而提高跟踪准确率。实验表明：所提算法表现良好且稳定，与SiamMask相比，在VOT2016、VOT2018和VOT2019数据集上期望平均重叠率分别提升了0.052、0.053和0.025，鲁棒性分别提升了0.06、0.079和0.156，且达到了平均每秒91帧的实时速度。
- 目标跟踪 /
- 图像分割 /
- SiamMask /
- 模板更新 /
- 特征增强
Abstract:
Aiming at the problem of tracking failure due to target deformation, flipping and occlusion in visual tracking, a template updating algorithm based on image structural similarity is proposed by dynamically updating the template to adapt to the changes of the target during tracking. The tracking feature enhancement module and segmentation feature enhancement module are also designed based on the SiamMask network. The tracking feature enhancement module consists of non-local operations and convolutional downsampling, which is used to establish contextual correlation, enhance the target features, suppress the background interference, improve the tracking robustness, and solve the feature attenuation problem due to the occlusion of the target. The segmentation feature enhancement module introduces the convolutional block attention module and deformable convolution to improve the network’s ability to capture channel and spatial features, adaptively learn the shape and contour information of the target, and enhance the network’s segmentation accuracy of the tracked target, which in turn improves the tracking accuracy. In comparison to the baseline SiamMask, experiments demonstrate that the proposed algorithm performs well and steadily in solving the aforementioned problems, improving the expected average overlap rate by 0.052, 0.053, and 0.025 and the robustness by 0.06, 0.079, and 0.156 on the VOT2016, VOT2018, and VOT2019 datasets, respectively. It also achieves a real-time speed of 91 frames per second on average.
- object tracking /
- image segmentation /
- SiamMask /
- template update /
- feature enhancement

HTML全文

图 1 基于模板更新和双特征增强的视觉跟踪算法框架

Figure 1. Framework of visual tracking algorithms based on template updating and dual-feature enhancement

下载: 全尺寸图片幻灯片

图 2 特征融合网络

Figure 2. Feature fusion network

下载: 全尺寸图片幻灯片

图 3 目标在跟踪失败前的变化过程

Figure 3. Process of object changes before tracking failure

下载: 全尺寸图片幻灯片

图 4 目标在不同视频帧之间的关联性

Figure 4. Object correlation between different video frames

下载: 全尺寸图片幻灯片

图 5 跟踪特征增强模块

Figure 5. Tracking feature enhancement module

下载: 全尺寸图片幻灯片

图 6 非局部操作

Figure 6. Non-local operation

下载: 全尺寸图片幻灯片

图 7 分割特征增强模块

Figure 7. Segmentation feature enhancement module

下载: 全尺寸图片幻灯片

图 8 CBAM模块

Figure 8. Convolutional block attention module

下载: 全尺寸图片幻灯片

图 9 可变形卷积网络

Figure 9. Deformable convolution network

下载: 全尺寸图片幻灯片

图 10 VOT2016数据集上视觉属性对比

Figure 10. Comparison of visual attributes on VOT2016 dataset

下载: 全尺寸图片幻灯片

图 11 VOT2016数据集上EAO排名

Figure 11. EAO rankings on VOT2016 dataset

下载: 全尺寸图片幻灯片

图 12 跟踪效果对比

Figure 12. Comparison of tracking results

下载: 全尺寸图片幻灯片

图 13 VOT2018数据集上EAO排名

Figure 13. EAO rankings on VOT2018 dataset

下载: 全尺寸图片幻灯片

图 14 VOT2019数据集上EAO排名

Figure 14. EAO rankings on VOT2019 dataset

下载: 全尺寸图片幻灯片

图 15 分割效果对比

Figure 15. Comparison of segmentation effect

下载: 全尺寸图片幻灯片

表 1 VOT2016数据集上不同视觉属性的实验结果

Table 1. Experimental results of different visual attributes on VOT2016 dataset

跟踪算法	总体EAO得分	EAO得分
跟踪算法	总体EAO得分	遮挡	相机运动	尺度变化	光照变化	运动变化	无定义
SiamFC^[2]	0.234	0.161	0.191	0.242	0.180	0.231	0.059
MDNet^[17]	0.257	0.218	0.238	0.312	0.313	0.252	0.030
C-COT^[18]	0.331	0.246	0.249	0.327	0.402	0.354	0.154
SiamRPN^[3]	0.344	0.117	0.205	0.280	0.270	0.176	0.065
DaSiamRPN^[4]	0.411	0.241	0.280	0.422	0.233	0.294	0.106
SiamMask^[6]	0.433	0.325	0.394	0.444	0.463	0.409	0.109
本文	0.485	0.470	0.472	0.527	0.617	0.470	0.104

下载: 导出CSV

表 2 不同跟踪算法在VOT2016数据集上的结果

Table 2. Results of different tracking algorithms on VOT2016 dataset

跟踪算法	准确率↑	鲁棒性↓	EAO↑
SiamMask^[6]	0.622	0.214	0.433
SiamRPN++^[5]	0.640	0.200	0.464
UpdateNet^[9]	0.610	0.210	0.481
Siam R-CNN^[19]	0.645	0.173	0.461
ULAST-on^[20]	0.603	0.214	0.417
本文	0.630	0.154	0.485

下载: 导出CSV

表 3 不同跟踪算法在VOT2018数据集上的结果

Table 3. Results of different tracking algorithms on VOT2018 dataset

跟踪算法	准确率↑	鲁棒性↓	EAO↑
SiamMask^[6]	0.609	0.276	0.380
SiamRPN++^[5]	0.600	0.230	0.415
Siam R-CNN^[19]	0.609	0.220	0.408
SiamFC++^[10]	0.587	0.183	0.426
ULAST-on^[20]	0.571	0.286	0.355
本文	0.603	0.197	0.433

下载: 导出CSV

表 4 不同跟踪算法在VOT2019数据集上的结果

Table 4. Results of different tracking algorithms on VOT2019 dataset

跟踪算法	准确率↑	鲁棒性↓	EAO↑
SiamFC^[2]	0.511	0.923	0.183
SiamRPN^[3]	0.582	0.527	0.272
SiamMask^[6]	0.594	0.572	0.274
SPM^[21]	0.577	0.507	0.275
SiamRPN++^[5]	0.599	0.482	0.285
本文	0.601	0.416	0.299

下载: 导出CSV

表 5 不同模板更新参数在VOT2018数据集上的结果

Table 5. Results of different template update parameters on VOT2018 dataset

队列长度N	$ \varDelta_ 1 $	$ \varDelta_ 2 $	EAO	分割速度/(帧·s⁻¹)
5	0.2	0.15	0.362	92
		0.2	0.375	96
		0.25	0.366	98
	0.25	0.15	0.381	102
		0.2	0.394	105
		0.25	0.389	105
	0.30	0.15	0.381	104
		0.2	0.379	105
		0.25	0.380	106
10	0.2	0.15	0.373	74
		0.2	0.379	77
		0.25	0.385	80
	0.25	0.15	0.384	79
		0.2	0.396	82
		0.25	0.388	84
	0.30	0.15	0.376	88
		0.2	0.378	89
		0.25	0.380	91

下载: 导出CSV

表 6 VOT2016数据集上的消融实验结果

Table 6. Results of ablation experiments on VOT2016 dataset

SiamMask	模板更新算法	跟踪特征增强模块	分割特征增强模块	准确率↑	鲁棒性↓	EAO↑	分割速度/(帧·s⁻¹)↑	ΔEAO↑
√				0.622	0.214	0.433	108
√	√			0.623	0.210	0.448	106	0.015↑
√		√		0.631	0.210	0.447	107	0.014↑
√			√	0.616	0.228	0.440	98	0.007↑
√		√	√	0.637	0.182	0.470	93	0.037↑
√	√	√	√	0.630	0.154	0.485	91	0.052↑

下载: 导出CSV

表 7 VOT2018数据集上的消融实验结果

Table 7. Results of ablation experiments on VOT2018 dataset

SiamMask	模板更新算法	跟踪特征增强模块	分割特征增强模块	准确率↑	鲁棒性↓	EAO↑	分割速度/(帧·s⁻¹)↑	ΔEAO↑
√				0.609	0.276	0.380	107
√	√			0.601	0.239	0.394	105	0.014↑
√		√		0.607	0.267	0.395	107	0.015↑
√			√	0.606	0.276	0.403	98	0.023↑
√		√	√	0.612	0.234	0.420	93	0.04↑
√	√	√	√	0.603	0.197	0.433	91	0.053↑

下载: 导出CSV

表 8 VOT2019数据集上的消融实验结果

Table 8. Results of ablation experiments on VOT2019 dataset

SiamMask	模板更新算法	跟踪特征增强模块	分割特征增强模块	准确率↑	鲁棒性↓	EAO↑	分割速度/(帧·s⁻¹)↑	ΔEAO↑
√				0.594	0.572	0.274	109
√	√			0.596	0.511	0.285	106	0.011↑
√		√		0.606	0.507	0.280	108	0.006↑
√			√	0.606	0.492	0.286	98	0.012↑
√		√	√	0.611	0.477	0.287	95	0.013↑
√	√	√	√	0.601	0.416	0.299	92	0.025↑

下载: 导出CSV

表 9 DAVIS2016数据集上的消融实验结果

Table 9. Results of ablation experiments on DAVIS2016 dataset

SiamMask	模板更新算法	跟踪特征增强模块	分割特征增强模块	mIoU (0.30)	mIoU (0.35)	mIoU (0.40)	mIoU (0.45)	分割速度/(帧·s⁻¹)↑
√				0.637	0.637	0.633	0.626	79
√	√			0.670	0.675	0.674	0.673	71
√		√		0.674	0.670	0.662	0.649	78
√			√	0.681	0.674	0.664	0.650	73
√		√	√	0.675	0.677	0.677	0.673	71
√	√	√	√	0.686	0.690	0.692	0.690	67

下载: 导出CSV

表 10 DAVIS2017数据集上的消融实验结果

Table 10. Results of ablation experiments on DAVIS2017 dataset

SiamMask	模板更新算法	跟踪特征增强模块	分割特征增强模块	mIoU (0.30)	mIoU (0.35)	mIoU (0.40)	mIoU (0.45)	分割速度/(帧·s⁻¹)↑
√				0.499	0.498	0.495	0.490	84
√	√			0.505	0.508	0.509	0.507	75
√		√		0.525	0.525	0.522	0.517	83
√			√	0.514	0.511	0.505	0.496	77
√		√	√	0.526	0.527	0.526	0.522	75
√	√	√	√	0.525	0.529	0.530	0.528	70

下载: 导出CSV

参考文献(21)

[1]	XIAO H, LIU X. Robust target tracking based on spatio-temporal context learning[J]. Journal of Information Hiding and Multimedia Signal Processing, 2019, 10(1): 212-220.
[2]	BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 850-865.
[3]	LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980.
[4]	ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 103-119.
[5]	LI B, WU W, WANG Q, et al. SiamRPN: evolution of Siamese visual tracking with very deep networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4277-4286.
[6]	HU W M, WANG Q, ZHANG L, et al. SiamMask: a framework for fast online object tracking and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3072-3089.
[7]	PARK E, BERG A C. Meta-Tracker: fast and robust online adaptation for visual object trackers[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 587-604.
[8]	GUO Q, FENG W, ZHOU C, et al. Learning dynamic Siamese network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 1781-1789.
[9]	ZHANG L C, GONZALEZ-GARCIA A, VAN DE WEIJER J, et al. Learning the model update for Siamese trackers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 4009-4018.
[10]	XU Y D, WANG Z Y, LI Z X, et al. SiamFC++: towards robust and accurate visual tracking with target estimation guidelines[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12549-12556.
[11]	CHEN Z D, ZHONG B N, LI G R, et al. Siamese box adaptive network for visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6667-6676.
[12]	GUO D Y, WANG J, CUI Y, et al. SiamCAR: Siamese fully convolutional classification and regression for visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6268-6276.
[13]	WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
[14]	WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803.
[15]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 3-19.
[16]	ZHU X Z, HU H, LIN S, et al. Deformable ConvNets V2: more deformable, better results[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 9300-9308.
[17]	NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 4293-4302.
[18]	DANELLJAN M, ROBINSON A, KHAN F S, et al. Beyond correlation filters: learning continuous convolution operators for visual tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 472-488.
[19]	VOIGTLAENDER P, LUITEN J, TORR P H S, et al. Siam R-CNN: visual tracking by re-detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6577-6587.
[20]	SHEN Q H, QIAO L, GUO J Y, et al. Unsupervised learning of accurate Siamese tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 8091-8100.
[21]	WANG G T, LUO C, XIONG Z W, et al. SPM-Tracker: series-parallel matching for real-time visual object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 3638-3647.