北京航空航天大学学报 ›› 2020, Vol. 46 ›› Issue (12): 2302-2310.doi: 10.13700/j.bh.1001-5965.2019.0601

• 论文 • 上一篇    下一篇

基于级联注意力机制的孪生网络视觉跟踪算法

蒲磊1, 冯新喜2, 侯志强3, 余旺盛2, 马素刚3   

  1. 1. 空军工程大学 研究生院, 西安 710077;
    2. 空军工程大学 信息与导航学院, 西安 710077;
    3. 西安邮电大学 计算机学院, 西安 710121
  • 收稿日期:2019-11-25 发布日期:2020-12-28
  • 通讯作者: 侯志强 E-mail:hzq@xupt.edu.cn
  • 作者简介:蒲磊,男,博士研究生。主要研究方向:目标跟踪;冯新喜,男,博士,教授。主要研究方向:信息融合;侯志强,男,博士,教授。主要研究方向:计算机视觉;余旺盛,男,博士,讲师。主要研究方向:模式识别;马素刚,男,博士,副教授。主要研究方向:目标跟踪。
  • 基金资助:
    国家自然科学基金(61571458,61703423)

Siamese network visual tracking algorithm based on cascaded attention mechanism

PU Lei1, FENG Xinxi2, HOU Zhiqiang3, YU Wangsheng2, MA Sugang3   

  1. 1. Graduate College, Air Force Engineering University, Xi'an 710077, China;
    2. Institute of Information and Navigation, Air Force Engineering University, Xi'an 710077, China;
    3. School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an 710121, China
  • Received:2019-11-25 Published:2020-12-28

摘要: 针对全卷积孪生网络(SiamFC)在相似物体干扰及目标发生大尺度外观变化时容易跟踪失败的问题,提出了一种基于级联注意力机制的孪生网络视觉跟踪算法。首先,在网络的最后一层加入非局部注意力模块,从空间维度得到关于目标区域的自注意特征图,并与最后一层特征进行相加运算。其次,考虑到不同通道特征对不同目标和各类场景的响应差异,引入通道注意力模块实现对特征通道的重要性选择。为了进一步提高跟踪的鲁棒性,将其与SiamFC算法进行加权融合,得到最终的响应图。最后,将提出的孪生网络模型在GOT10k和VID数据集上进行联合训练,进一步提升模型的表达力与判别力。实验结果表明:所提算法相比于SiamFC,在跟踪精度上提高了9.3%,在成功率上提高了5.4%。

关键词: 视觉跟踪, 孪生网络, 非局部注意力, 通道注意力, 模型集成

Abstract: Aimed at the problem that the Fully Convolutional Siamese Network (SiamFC) is easy to fail to track when it is disturbed by similar object or the target has large-scale appearance changes, this paper proposes a Siamese network visual tracking algorithm based on cascaded attention mechanism. First, the non-local attention module is added to the last layer of the network, and the self-attention feature map of the target area is obtained from the spatial dimension and is added with the last-layer feature. Then, considering the different responses of different channel features to different targets and scenes, the channel attention module is introduced to select the importance of feature channel. In order to further improve the robustness of tracking, it is weighted fused with SiamFC algorithm to obtain the final response map. Finally, the Siamese network model is proposed to jointly train on the GOT10k and VID data set to further improve the expression and discrimination of the model. Experimental results show that compared with SiamFC, the proposed algorithm improves the accuracy by 9.3% and the success rate by 5.4%.

Key words: visual tracking, Siamese network, non-local attention, channel attention, model integration

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发