Volume 50 Issue 2
Feb.  2024
Turn off MathJax
Article Contents
ZHAO W,LIU L,WANG K P,et al. Multimodal bidirectional information enhancement network for RGBT tracking[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(2):596-605 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0395
Citation: ZHAO W,LIU L,WANG K P,et al. Multimodal bidirectional information enhancement network for RGBT tracking[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(2):596-605 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0395

Multimodal bidirectional information enhancement network for RGBT tracking

doi: 10.13700/j.bh.1001-5965.2022.0395
Funds:  National Natural Science Foundation of China (62376005); Anhui Provincial Key Research and Development Program (202104d07020008,KJ2020A0033); Anhui Provincial Natural Science Foundation (2108085MF211); University Synergy Innovation Program of Anhui Province (GXXT-2022-014)
More Information
  • Corresponding author: E-mail:zhengzhengahu@163.com
  • Received Date: 20 May 2022
  • Accepted Date: 02 Jul 2022
  • Available Online: 31 Oct 2022
  • Publish Date: 18 Oct 2022
  • The goal of RGB-thermal infrared (RGBT) visual object tracking, which has drawn increasing interest in recent years, is to take advantage of the complimentary strengths of RGB and thermal infrared picture data to accomplish reliable visual tracking. For obtaining a robust appearance representation of an object, existing mainstream methods introduced the modal weight to fuse information of two modalities. Simply assigning weights to the individual modalities can’t fully explore the complementary benefits of RGB and thermal infrared modalities. To solve these problems, propose a novel multimodal bidirectional information enhancement network for RGBT tracking (MBIENet). Specifically, design a feature aggregation module to aggregate modality-shared and modality-specific features for modeling the appearance information of an object. Further proposes a novel multimodal bidirectional modulation fusion module that can effectively fuse the complementary information of two modalities and alleviate the impact of redundant and useless features on the tracker. The contributions of various modalities in various situations are then adaptively adjusted using a lightweight channel-spatial attention module that is proposed. Experimental results on GTOT, RGBT234, and LasHeR datasets show that the accuracy rate and success rate of the proposed method are better than the existing mainstream trackers.

     

  • loading
  • [1]
    BERYINETTO L, VALMADRE J, HENRIQUES J, et al. Fully-convolutional Siamese networks for object tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2016: 850-865.
    [2]
    GE S M, LUO Z, ZHANG C H, et al. Distilling channels for efficient deep tracking[J]. IEEE Transactions on Image Processing, 2020, 29: 2610-2621.
    [3]
    ZHANG J M, JIN X K, SUN J, et al. Spatial and semantic convolutional features for robust visual object tracking[J]. Multimedia Tools and Applications, 2020, 79(21): 15095-15115.
    [4]
    LIU Q, LU X H, HE Z Y, et al. Deep convolutional neural networks for thermal infrared object tracking[J]. Knowledge-Based Systems, 2017, 134: 189-198. doi: 10.1016/j.knosys.2017.07.032
    [5]
    ZHANG L C, GONZALEZ-GARCIA A, DANELLJAN M, et al. Synthetic data generation for end-to-end thermal infrared tracking[J]. IEEE Transactions on Image Processing, 2019, 28(4): 1837-1850.
    [6]
    LIU Q, LI X, HE Z Y, et al. Learning deep multi-level similarity for thermal infrared object tracking[J]. IEEE Transactions on Multimedia, 2021, 23: 2114-2126.
    [7]
    SUN T X, SHAO Y F, LI X N, et al. Learning sparse sharing architectures for multiple tasks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press,2020, 34: 8936-8943.
    [8]
    GAO Y, LI C L, ZHU Y B, et al. Deep adaptive fusion network for high performance RGBT tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 91-99.
    [9]
    LI C L, LIU L, LU A D, et al. Challenge-aware RGBT tracking[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 222-237.
    [10]
    ZHANG P Y, WANG D, LU H C, et al. Learning adaptive attribute-driven representation for real-time RGB-T tracking[J]. International Journal of Computer Vision, 2021, 129(9): 2714-2729. doi: 10.1007/s11263-021-01495-3
    [11]
    XIAO Y, YANG M M, LI C L, et al. Attribute-based progressive fusion network for RGBT tracking[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2022: 2831-2838.
    [12]
    LI C L, LU A D, ZHENG A H, et al. Multi-adapter RGBT tracking[C]//Proceedings of the IEEE/CVF International Comference on Computer Vision. Piscataway: IEEE Press, 2019: 2262-2270.
    [13]
    XU Q, MEI Y M, LIU J P, et al. Multimodal cross-layer bilinear pooling for RGBT tracking[J]. IEEE Transactions on Multimedia, 2022, 24: 567-580.
    [14]
    NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 4293-4302.
    [15]
    DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2009: 248-255.
    [16]
    SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018.
    [17]
    LI C L, CHENG H, HU S Y, et al. Learning collaborative sparse representation for grayscale-thermal tracking[J]. IEEE Transactions on Image Processing, 2016, 25(12): 5743-5756. doi: 10.1109/TIP.2016.2614135
    [18]
    LI C L, ZHAO N, LU Y J, et al. Weighted sparse representation regularized graph learning for RGB-T object tracking[C]//Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM, 2017: 1856-1864.
    [19]
    LI C L, SUN X, WANG X, et al. Grayscale-thermal object tracking via multitask Laplacian sparse representation[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017, 47(4): 673-681.
    [20]
    ZHU Y B, LI C L, TANG J, et al. Quality-aware feature aggregation network for robust RGBT tracking[J]. IEEE Transactions on Intelligent Vehicles, 2020, 6(1): 121-130.
    [21]
    PEREZ E, STRUB F, DE VRIES H, et al. FiLM: Visual reasoning with a general conditioning layer[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018, 483: 3942-3951.
    [22]
    WANG X, SHU X J, ZHANG S L, et al. MFGNet: Dynamic modality-aware filter generation for RGB-T tracking[J]. IEEE Transactions on Multimedia, 2023, 25: 4335-4348.
    [23]
    WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 3-19.
    [24]
    LI C L, LIANG X Y, LU Y J, et al. RGB-T object tracking: Benchmark and baseline[J]. Pattern Recognition, 2019, 96: 106977. doi: 10.1016/j.patcog.2019.106977
    [25]
    LI C L, XUE W L, JIA Y Q, et al. LasHeR: A large-scale high-diversity benchmark for RGBT tracking[J]. IEEE Transactions on Image Processing, 2022, 31: 392-404.
    [26]
    TU Z Z, LIN C, ZHAO W, et al. M5L: Multi-modal multi-margin metric learning for RGBT tracking[J]. IEEE Transactions on Image Processing, 2022, 31: 85-98.
    [27]
    WANG C Q, XU C Y, CUI Z, et al. Cross-modal pattern-propagation for RGB-T tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 7064-7073.
    [28]
    ZHANG H, ZHANG L, ZHUO L, et al. Object tracking in RGB-T videos using modal-aware attention network and competitive learning[J]. Sensors, 2020, 20(2): 393. doi: 10.3390/s20020393
    [29]
    ZHANG L C, DANELLJAN M, GONZALEZ-GARCIA A, et al. Multi-modal fusion for end-to-end RGB-T tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 2252-2261.
    [30]
    HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141.
    [31]
    ZHU Y B, LI C L, LUO B, et al. Dense feature aggregation and pruning for rgbt tracking[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York: ACM, 2019: 465-472.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(10)  / Tables(2)

    Article Metrics

    Article views(717) PDF downloads(15) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return