-
摘要:
针对视觉跟踪中由于目标形变、翻转和遮挡而导致的跟踪失败问题,提出了一种基于图像结构相似性的模板更新算法,通过动态更新模板以适应目标在跟踪过程中的变化。同时,基于SiamMask网络设计了跟踪特征增强模块和分割特征增强模块。跟踪特征增强模块包括非局部操作和卷积下采样,用于建立上下文关联,增强目标特征,抑制背景干扰,提高跟踪鲁棒性,解决由于目标被遮挡而导致的特征减弱问题。分割特征增强模块引入卷积块注意力模块和可变形卷积,以提高网络对通道和空间特征的捕捉能力,自适应地学习目标的形状和轮廓信息,提升网络对跟踪目标的分割精度,进而提高跟踪准确率。实验表明:所提算法表现良好且稳定,与SiamMask相比,在VOT2016、VOT2018和VOT2019数据集上期望平均重叠率分别提升了0.052、0.053和0.025,鲁棒性分别提升了0.06、0.079和0.156,且达到了平均每秒91帧的实时速度。
Abstract:Aiming at the problem of tracking failure due to target deformation, flipping and occlusion in visual tracking, a template updating algorithm based on image structural similarity is proposed by dynamically updating the template to adapt to the changes of the target during tracking. The tracking feature enhancement module and segmentation feature enhancement module are also designed based on the SiamMask network. The tracking feature enhancement module consists of non-local operations and convolutional downsampling, which is used to establish contextual correlation, enhance the target features, suppress the background interference, improve the tracking robustness, and solve the feature attenuation problem due to the occlusion of the target. The segmentation feature enhancement module introduces the convolutional block attention module and deformable convolution to improve the network’s ability to capture channel and spatial features, adaptively learn the shape and contour information of the target, and enhance the network’s segmentation accuracy of the tracked target, which in turn improves the tracking accuracy. In comparison to the baseline SiamMask, experiments demonstrate that the proposed algorithm performs well and steadily in solving the aforementioned problems, improving the expected average overlap rate by 0.052, 0.053, and 0.025 and the robustness by 0.06, 0.079, and 0.156 on the VOT2016, VOT2018, and VOT2019 datasets, respectively. It also achieves a real-time speed of 91 frames per second on average.
-
Key words:
- object tracking /
- image segmentation /
- SiamMask /
- template update /
- feature enhancement
-
表 1 VOT2016数据集上不同视觉属性的实验结果
Table 1. Experimental results of different visual attributes on VOT2016 dataset
跟踪算法 总体EAO得分 EAO得分 遮挡 相机运动 尺度变化 光照变化 运动变化 无定义 SiamFC[2] 0.234 0.161 0.191 0.242 0.180 0.231 0.059 MDNet[17] 0.257 0.218 0.238 0.312 0.313 0.252 0.030 C-COT[18] 0.331 0.246 0.249 0.327 0.402 0.354 0.154 SiamRPN[3] 0.344 0.117 0.205 0.280 0.270 0.176 0.065 DaSiamRPN[4] 0.411 0.241 0.280 0.422 0.233 0.294 0.106 SiamMask[6] 0.433 0.325 0.394 0.444 0.463 0.409 0.109 本文 0.485 0.470 0.472 0.527 0.617 0.470 0.104 表 2 不同跟踪算法在VOT2016数据集上的结果
Table 2. Results of different tracking algorithms on VOT2016 dataset
表 3 不同跟踪算法在VOT2018数据集上的结果
Table 3. Results of different tracking algorithms on VOT2018 dataset
表 4 不同跟踪算法在VOT2019数据集上的结果
Table 4. Results of different tracking algorithms on VOT2019 dataset
表 5 不同模板更新参数在VOT2018数据集上的结果
Table 5. Results of different template update parameters on VOT2018 dataset
队列长度N $ \varDelta_ 1 $ $ \varDelta_ 2 $ EAO 分割速度/(帧·s−1) 5 0.2 0.15 0.362 92 0.2 0.375 96 0.25 0.366 98 0.25 0.15 0.381 102 0.2 0.394 105 0.25 0.389 105 0.30 0.15 0.381 104 0.2 0.379 105 0.25 0.380 106 10 0.2 0.15 0.373 74 0.2 0.379 77 0.25 0.385 80 0.25 0.15 0.384 79 0.2 0.396 82 0.25 0.388 84 0.30 0.15 0.376 88 0.2 0.378 89 0.25 0.380 91 表 6 VOT2016数据集上的消融实验结果
Table 6. Results of ablation experiments on VOT2016 dataset
SiamMask 模板更新算法 跟踪特征增强模块 分割特征增强模块 准确率↑ 鲁棒性↓ EAO↑ 分割速度/(帧·s−1)↑ ΔEAO↑ √ 0.622 0.214 0.433 108 √ √ 0.623 0.210 0.448 106 0.015↑ √ √ 0.631 0.210 0.447 107 0.014↑ √ √ 0.616 0.228 0.440 98 0.007↑ √ √ √ 0.637 0.182 0.470 93 0.037↑ √ √ √ √ 0.630 0.154 0.485 91 0.052↑ 表 7 VOT2018数据集上的消融实验结果
Table 7. Results of ablation experiments on VOT2018 dataset
SiamMask 模板更新算法 跟踪特征增强模块 分割特征增强模块 准确率↑ 鲁棒性↓ EAO↑ 分割速度/(帧·s−1)↑ ΔEAO↑ √ 0.609 0.276 0.380 107 √ √ 0.601 0.239 0.394 105 0.014↑ √ √ 0.607 0.267 0.395 107 0.015↑ √ √ 0.606 0.276 0.403 98 0.023↑ √ √ √ 0.612 0.234 0.420 93 0.04↑ √ √ √ √ 0.603 0.197 0.433 91 0.053↑ 表 8 VOT2019数据集上的消融实验结果
Table 8. Results of ablation experiments on VOT2019 dataset
SiamMask 模板更新算法 跟踪特征增强模块 分割特征增强模块 准确率↑ 鲁棒性↓ EAO↑ 分割速度/(帧·s−1)↑ ΔEAO↑ √ 0.594 0.572 0.274 109 √ √ 0.596 0.511 0.285 106 0.011↑ √ √ 0.606 0.507 0.280 108 0.006↑ √ √ 0.606 0.492 0.286 98 0.012↑ √ √ √ 0.611 0.477 0.287 95 0.013↑ √ √ √ √ 0.601 0.416 0.299 92 0.025↑ 表 9 DAVIS2016数据集上的消融实验结果
Table 9. Results of ablation experiments on DAVIS2016 dataset
SiamMask 模板更新算法 跟踪特征增强模块 分割特征增强模块 mIoU (0.30) mIoU (0.35) mIoU (0.40) mIoU (0.45) 分割速度/(帧·s−1)↑ √ 0.637 0.637 0.633 0.626 79 √ √ 0.670 0.675 0.674 0.673 71 √ √ 0.674 0.670 0.662 0.649 78 √ √ 0.681 0.674 0.664 0.650 73 √ √ √ 0.675 0.677 0.677 0.673 71 √ √ √ √ 0.686 0.690 0.692 0.690 67 表 10 DAVIS2017数据集上的消融实验结果
Table 10. Results of ablation experiments on DAVIS2017 dataset
SiamMask 模板更新算法 跟踪特征增强模块 分割特征增强模块 mIoU (0.30) mIoU (0.35) mIoU (0.40) mIoU (0.45) 分割速度/(帧·s−1)↑ √ 0.499 0.498 0.495 0.490 84 √ √ 0.505 0.508 0.509 0.507 75 √ √ 0.525 0.525 0.522 0.517 83 √ √ 0.514 0.511 0.505 0.496 77 √ √ √ 0.526 0.527 0.526 0.522 75 √ √ √ √ 0.525 0.529 0.530 0.528 70 -
[1] XIAO H, LIU X. Robust target tracking based on spatio-temporal context learning[J]. Journal of Information Hiding and Multimedia Signal Processing, 2019, 10(1): 212-220. [2] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 850-865. [3] LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980. [4] ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 103-119. [5] LI B, WU W, WANG Q, et al. SiamRPN: evolution of Siamese visual tracking with very deep networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4277-4286. [6] HU W M, WANG Q, ZHANG L, et al. SiamMask: a framework for fast online object tracking and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3072-3089. [7] PARK E, BERG A C. Meta-Tracker: fast and robust online adaptation for visual object trackers[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 587-604. [8] GUO Q, FENG W, ZHOU C, et al. Learning dynamic Siamese network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 1781-1789. [9] ZHANG L C, GONZALEZ-GARCIA A, VAN DE WEIJER J, et al. Learning the model update for Siamese trackers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 4009-4018. [10] XU Y D, WANG Z Y, LI Z X, et al. SiamFC++: towards robust and accurate visual tracking with target estimation guidelines[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12549-12556. [11] CHEN Z D, ZHONG B N, LI G R, et al. Siamese box adaptive network for visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6667-6676. [12] GUO D Y, WANG J, CUI Y, et al. SiamCAR: Siamese fully convolutional classification and regression for visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6268-6276. [13] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. [14] WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803. [15] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 3-19. [16] ZHU X Z, HU H, LIN S, et al. Deformable ConvNets V2: more deformable, better results[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 9300-9308. [17] NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 4293-4302. [18] DANELLJAN M, ROBINSON A, KHAN F S, et al. Beyond correlation filters: learning continuous convolution operators for visual tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 472-488. [19] VOIGTLAENDER P, LUITEN J, TORR P H S, et al. Siam R-CNN: visual tracking by re-detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6577-6587. [20] SHEN Q H, QIAO L, GUO J Y, et al. Unsupervised learning of accurate Siamese tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 8091-8100. [21] WANG G T, LUO C, XIONG Z W, et al. SPM-Tracker: series-parallel matching for real-time visual object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 3638-3647. -


下载: