留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多模态双向信息增强的RGBT跟踪网络

赵伟 刘磊 王鲲鹏 涂铮铮 罗斌

赵伟,刘磊,王鲲鹏,等. 基于多模态双向信息增强的RGBT跟踪网络[J]. 北京航空航天大学学报,2024,50(2):596-605 doi: 10.13700/j.bh.1001-5965.2022.0395
引用本文: 赵伟,刘磊,王鲲鹏,等. 基于多模态双向信息增强的RGBT跟踪网络[J]. 北京航空航天大学学报,2024,50(2):596-605 doi: 10.13700/j.bh.1001-5965.2022.0395
ZHAO W,LIU L,WANG K P,et al. Multimodal bidirectional information enhancement network for RGBT tracking[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(2):596-605 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0395
Citation: ZHAO W,LIU L,WANG K P,et al. Multimodal bidirectional information enhancement network for RGBT tracking[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(2):596-605 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0395

基于多模态双向信息增强的RGBT跟踪网络

doi: 10.13700/j.bh.1001-5965.2022.0395
基金项目: 国家自然科学基金(62376005);安徽省重点研发计划(202104d07020008,KJ2020A0033);安徽省自然科学基金(2108085MF211);安徽省高校协同创新项目(GXXT-2022-014)
详细信息
    通讯作者:

    E-mail:zhengzhengahu@163.com

  • 中图分类号: TP183

Multimodal bidirectional information enhancement network for RGBT tracking

Funds: National Natural Science Foundation of China (62376005); Anhui Provincial Key Research and Development Program (202104d07020008,KJ2020A0033); Anhui Provincial Natural Science Foundation (2108085MF211); University Synergy Innovation Program of Anhui Province (GXXT-2022-014)
More Information
  • 摘要:

    可见光-热红外(RGBT)目标跟踪旨在挖掘可见光和热红外数据的互补优势,实现鲁棒的目标跟踪。目前主流方法通常引入模态权重来实现多模态信息融合,但简单地为各个模态分配权重无法充分挖掘可见光和热红外模态的互补优势。基于此,提出了一种多模态双向信息增强的RGBT跟踪网络(MBIENet)。设计了一种特征聚合模块,用于聚合模态共享特征和模态特定特征以建模目标外观信息;提出了一种新的多模态双向调制融合模块,可有效融合模态互补信息,减少冗余特征和无用特征对跟踪器的影响;提出了一个轻量化的通道空间注意力模块,可自适应调整不同环境下不同模态的贡献。在GTOT、RGBT234和LasHeR数据集上的实验结果表明:所提跟踪算法的准确率和成功率优于当前主流的跟踪算法。

     

  • 图 1  MBIENet框架

    Figure 1.  MBIENet framework

    图 2  模态不可靠示例

    Figure 2.  Example of modal unreliability

    图 3  特征聚合模块结构

    Figure 3.  Structure of feature aggregation module

    图 4  多模态双向调制融合模块结构

    Figure 4.  Structure of multimode bidirectional modulation fusion module

    图 5  特征图可视化

    Figure 5.  Visualization of feature map

    图 6  通道空间注意力模块结构

    Figure 6.  Structure of channel-spatial attention module

    图 7  GTOT评估曲线

    Figure 7.  Evaluation curves of GTOT

    图 8  RGBT234评估曲线

    Figure 8.  Evaluation curves of RGBT234

    图 9  LasHeR评估曲线

    Figure 9.  Evaluation curves of LasHeR

    图 10  跟踪结果比较

    Figure 10.  Tracking results comparison

    表  1  消融实验结果

    Table  1.   Ablation experiments results

    模块 准确率 成功率
    RGBT234 LasHeR GTOT RGBT234 LasHeR GTOT
    MBIENet-FAM 0.818 0.477 0.892 0.568 0.348 0.707
    MBIENet-MBMF 0.813 0.471 0.886 0.560 0.342 0.701
    MBIENet-CSA 0.822 0.479 0.899 0.574 0.347 0.718
    MBIENet 0.829 0.484 0.902 0.582 0.355 0.720
    下载: 导出CSV

    表  2  注意力机制比较

    Table  2.   Comparison of attention mechanisms

    模块 准确率 成功率 检测速度/
    (帧·s−1)
    GTOT RGBT234 GTOT RGBT234
    MBIENet Concat 0.896 0.822 0.709 0.574 1.8
    MBIENet CBAM 0.897 0.824 0.714 0.576 1.38
    MBIENet SE 0.899 0.825 0.718 0.580 1.6
    MBIENet CSA 0.902 0.829 0.720 0.582 1.54
    下载: 导出CSV
  • [1] BERYINETTO L, VALMADRE J, HENRIQUES J, et al. Fully-convolutional Siamese networks for object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 850-865.
    [2] GE S M, LUO Z, ZHANG C H, et al. Distilling channels for efficient deep tracking[J]. IEEE Transactions on Image Processing, 2020, 29: 2610-2621.
    [3] ZHANG J M, JIN X K, SUN J, et al. Spatial and semantic convolutional features for robust visual object tracking[J]. Multimedia Tools and Applications, 2020, 79(21): 15095-15115.
    [4] LIU Q, LU X H, HE Z Y, et al. Deep convolutional neural networks for thermal infrared object tracking[J]. Knowledge-Based Systems, 2017, 134: 189-198. doi: 10.1016/j.knosys.2017.07.032
    [5] ZHANG L C, GONZALEZ-GARCIA A, DANELLJAN M, et al. Synthetic data generation for end-to-end thermal infrared tracking[J]. IEEE Transactions on Image Processing, 2019, 28(4): 1837-1850.
    [6] LIU Q, LI X, HE Z Y, et al. Learning deep multi-level similarity for thermal infrared object tracking[J]. IEEE Transactions on Multimedia, 2021, 23: 2114-2126.
    [7] SUN T X, SHAO Y F, LI X N, et al. Learning sparse sharing architectures for multiple tasks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press,2020, 34: 8936-8943.
    [8] GAO Y, LI C L, ZHU Y B, et al. Deep adaptive fusion network for high performance RGBT tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 91-99.
    [9] LI C L, LIU L, LU A D, et al. Challenge-aware RGBT tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 222-237.
    [10] ZHANG P Y, WANG D, LU H C, et al. Learning adaptive attribute-driven representation for real-time RGB-T tracking[J]. International Journal of Computer Vision, 2021, 129(9): 2714-2729. doi: 10.1007/s11263-021-01495-3
    [11] XIAO Y, YANG M M, LI C L, et al. Attribute-based progressive fusion network for RGBT tracking[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2022: 2831-2838.
    [12] LI C L, LU A D, ZHENG A H, et al. Multi-adapter RGBT tracking[C]//Proceedings of the IEEE/CVF International Comference on Computer Vision. Piscataway: IEEE Press, 2019: 2262-2270.
    [13] XU Q, MEI Y M, LIU J P, et al. Multimodal cross-layer bilinear pooling for RGBT tracking[J]. IEEE Transactions on Multimedia, 2022, 24: 567-580.
    [14] NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 4293-4302.
    [15] DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2009: 248-255.
    [16] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018.
    [17] LI C L, CHENG H, HU S Y, et al. Learning collaborative sparse representation for grayscale-thermal tracking[J]. IEEE Transactions on Image Processing, 2016, 25(12): 5743-5756. doi: 10.1109/TIP.2016.2614135
    [18] LI C L, ZHAO N, LU Y J, et al. Weighted sparse representation regularized graph learning for RGB-T object tracking[C]//Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM, 2017: 1856-1864.
    [19] LI C L, SUN X, WANG X, et al. Grayscale-thermal object tracking via multitask Laplacian sparse representation[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017, 47(4): 673-681.
    [20] ZHU Y B, LI C L, TANG J, et al. Quality-aware feature aggregation network for robust RGBT tracking[J]. IEEE Transactions on Intelligent Vehicles, 2020, 6(1): 121-130.
    [21] PEREZ E, STRUB F, DE VRIES H, et al. FiLM: Visual reasoning with a general conditioning layer[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018, 483: 3942-3951.
    [22] WANG X, SHU X J, ZHANG S L, et al. MFGNet: Dynamic modality-aware filter generation for RGB-T tracking[J]. IEEE Transactions on Multimedia, 2023, 25: 4335-4348.
    [23] WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 3-19.
    [24] LI C L, LIANG X Y, LU Y J, et al. RGB-T object tracking: Benchmark and baseline[J]. Pattern Recognition, 2019, 96: 106977. doi: 10.1016/j.patcog.2019.106977
    [25] LI C L, XUE W L, JIA Y Q, et al. LasHeR: A large-scale high-diversity benchmark for RGBT tracking[J]. IEEE Transactions on Image Processing, 2022, 31: 392-404.
    [26] TU Z Z, LIN C, ZHAO W, et al. M5L: Multi-modal multi-margin metric learning for RGBT tracking[J]. IEEE Transactions on Image Processing, 2022, 31: 85-98.
    [27] WANG C Q, XU C Y, CUI Z, et al. Cross-modal pattern-propagation for RGB-T tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 7064-7073.
    [28] ZHANG H, ZHANG L, ZHUO L, et al. Object tracking in RGB-T videos using modal-aware attention network and competitive learning[J]. Sensors, 2020, 20(2): 393. doi: 10.3390/s20020393
    [29] ZHANG L C, DANELLJAN M, GONZALEZ-GARCIA A, et al. Multi-modal fusion for end-to-end RGB-T tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 2252-2261.
    [30] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141.
    [31] ZHU Y B, LI C L, LUO B, et al. Dense feature aggregation and pruning for rgbt tracking[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York: ACM, 2019: 465-472.
  • 加载中
图(10) / 表(2)
计量
  • 文章访问数:  719
  • HTML全文浏览量:  96
  • PDF下载量:  15
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-05-20
  • 录用日期:  2022-07-02
  • 网络出版日期:  2022-10-18
  • 整期出版日期:  2024-02-27

目录

    /

    返回文章
    返回
    常见问答