-
摘要:
基于匹配思想的孪生网络算法缺乏对目标的整体性感知,容易出现对目标状态估计不够精准和在复杂环境中跟丢的现象。为此,在孪生网络的基础上设计了2个轻量级的模块来实现更精准、更鲁棒的目标跟踪。在提取特征的主干网络之后,嵌入一个高效通道注意力模块,实现高效提取目标特征并增强差异化表示,使网络更注重于目标信息;模板匹配之后的特征通过一个局部上下文感知模块,增强网络对目标的整体感知,以应对跟踪过程中复杂多变的环境;采用Anchor-free的状态估计策略实现对目标的精准估计。实验结果表明:所提算法SiamCC在数据集OTB100、VOT2016和VOT2018上的测试结果均好于DaSiamRPN、ATOM等算法,并且跟踪速度达到了85帧/s。
Abstract:The matching-based Siamese network algorithm often lacks the overall perception of a target, which easily leads to inaccurate target state estimation and target missing in complex environments. Therefore, this paper designs two lightweight modules on the basis of the twin network to achieve more accurate and robust target tracking. An efficient channel attention module is embedded into the backbone network after its construction for feature extraction. Efficient extraction of target features and enhanced differential representation are achieved. so that the network pays more attention to the target information. The features after template matching pass a local context awareness module, thus enhancing the network's overall perception of the target to deal with the complex and changeable environment in the tracking process. The Anchor-free state estimation strategy is used to achieve accurate estimation of the target. Experimental results show that on the datasets OTB100, VOT2016 and VOT2018, SiamCC algorithm outperforms DaSiamRPN algorithms and ATOM algorithm, with the tracking speed reaching 85 frame/s.
-
Key words:
- machine vision /
- target tracking /
- Siamese network /
- channel attention /
- context awareness
-
表 1 不同算法在数据集VOT2016上的测试结果对比
Table 1. Comparison of test results of different algorithms on the VOT2016 datasets
算法 准确性 鲁棒性 EAO SiamCC(本文) 0.61 0.16 0.448 SPM 0.62 0.21 0.434 DaSiamRPN 0.61 0.22 0.411 ECO 0.55 0.20 0.375 C-RPN 0.59 0.27 0.363 SiamRPN 0.56 0.26 0.344 STRCF 0.55 0.313 SASiam 0.54 0.34 0.291 MDNet 0.51 0.25 0.283 SiamFC 0.53 0.46 0.235 表 2 不同算法在数据集VOT2018上的测试结果对比
Table 2. Comparison of test results of different algorithms on the VOT2018 datasets
算法 准确性 鲁棒性 EAO SiamCC(本文) 0.58 0.19 0.405 SiamRPN++ 0.60 0.23 0.414 ATOM 0.59 0.20 0.401 SiamMask 0.61 0.28 0.380 SPM 0.58 0.30 0.338 DaSiamRPN 0.56 0.34 0.326 ECO 0.48 0.27 0.280 GradNet 0.51 0.38 0.247 SiamRPN 0.49 0.46 0.244 SASiam 0.50 0.46 0.236 表 3 本文算法在数据集OTB2013和VOT2016上的消融实验
Table 3. Ablation experiment of the proposed algorithm on OTB2013 and VOT2016 datasets
模块 模块 模块 模块 OTB2013成功率 VOT2016 EAO 速度/(帧·s-1) 0.661 0.417 93 ECANet 0.665 0.420 91 CCNet 0.667 0.422 88 ECAM 0.669 0.426 91 LCAM 0.676 0.431 88 ECANet CCNet 0.672 0.429 85 ECAM LCAM 0.684 0.448 85 -
[1] 孟琭, 杨旭. 目标跟踪算法综述[J]. 自动化学报, 2019, 45(7): 1244-1260. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201907003.htmMENG L, YANG X. A survey of object tracking algorithms[J]. Acta Automatica Sinica, 2019, 45(7): 1244-1260(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201907003.htm [2] BOLME D S, BEVERIDEG J R, DRAPER B A, et al. Visual object tracking using adaptive correlation filters[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2010: 2544-2550. [3] HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596. doi: 10.1109/TPAMI.2014.2345390 [4] DANELLJAN M, BHAT G, SHAHBAZ K F, et al. ECO: Efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6638-6646. [5] 李玺, 查宇飞, 张天柱, 等. 基于深度学习的目标跟踪算法发展综述[J]. 中国图象图形学报, 2019, 24(12): 2057-2080. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB201912001.htmLI X, ZHA Y F, ZHANG T Z, et al. A survey of visual object tracking algorithms based on deep learning[J]. Journal of Image and Graphics, 2019, 24(12): 2057-2080(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB201912001.htm [6] WANG N, YEUNG D Y. Learning a deep compact image representation for visual tracking[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. New York: ACM, 2013: 809-817. [7] MA C, HUANG J B, YANG X, et al. Hierarchical convolutional features for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 3074-3082. [8] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 850-865. [9] HE A, LUO C, TIAN X, et al. A twofold Siamese network for real-time object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4834-4843. [10] LI B, YAN J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980. [11] FAN H, LING H. Siamese cascaded region proposal networks for realtime visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7952-7961. [12] SHEN J, TANG X, DONG X, et al. Visual object tracking by hierarchical attention Siamese network[J]. IEEE Transactions on Cybernetics, 2019, 50(7): 3068-3080. [13] ZHANG Z, PENG H. Deeper and wider Siamese networks for real-time visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4591-4600. [14] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778. [15] KEIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems, 2012: 1097-1105. [16] WANG Q, TENG Z, XING J, et al. Learning attentions: Residual attentional Siamese network for high performance online visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4854-4863. [17] WANG F, JIANG M, QIAN C, et al. Residual attention network for image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 3156-3164. [18] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141. [19] WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 3-19. [20] WANG Q, WU B, ZHU P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 11534-11542. [21] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803. [22] HUANG Z, WANG X, HUANG L, et al. CCNet: Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 603-612. [23] KONG T, SUN F, LIU H, et al. FoveaBox: Beyound anchor-based object detection[J]. IEEE Transactions on Image Processing, 2020, 29: 7389-7398. doi: 10.1109/TIP.2020.3002345 [24] LI B, WU W, WANG Q, et al. SiamRPN++: Evolution of Siamese visual tracking with very deep networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4282-4291. [25] ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 101-117. [26] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2980-2988. [27] HUANG L, ZHAO X, HUANG K. GOT-10k: A large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1562-1577. [28] FAN H, LIN L, YANG F, et al. LaSOT: A high-quality benchmark for large-scale single object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 5374-5383. [29] WU Y, LIM J, YANG M H. Online object tracking: A benchmark[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2013: 2411-2418. [30] DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: Accurate tracking by overlap maximization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4660-4669. [31] LI P, CHEN B, OUYANG W, et al. GradNet: Gradient-guided network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 6162-6171. [32] GAO J, ZHANG T, XU C. Graph convolutional tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4649-4659. [33] WANG G, LUO C, XIONG Z, et al. SPM-tracker: Series-parallel matching for real-time visual object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 3643-3652. [34] WANG Q, ZHANG L, BERTINETTO L, et al. Fast online object tracking and segmentation: A unifying approach[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1328-1338.