Volume 50 Issue 2
Feb.  2024
Turn off MathJax
Article Contents
LIANG H,WANG H R,WANG D. Cross-modal hashing network based on self-attention similarity transfer[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(2):615-622 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0402
Citation: LIANG H,WANG H R,WANG D. Cross-modal hashing network based on self-attention similarity transfer[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(2):615-622 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0402

Cross-modal hashing network based on self-attention similarity transfer

doi: 10.13700/j.bh.1001-5965.2022.0402
Funds:  Ningxia Hui Autonomous Region Department of Education Higher Education Key Project of Scientific Research (NYG2022051); Ningxia Natural Science Foundation Project (2023AAC03316)
More Information
  • Corresponding author: E-mail:bmdwhr@163.com
  • Received Date: 21 May 2022
  • Accepted Date: 02 Jul 2022
  • Available Online: 31 Oct 2022
  • Publish Date: 21 Oct 2022
  • To further improve the performance of cross-modal retrieval, a cross-modal hashing network model is proposed based on self-attention similarity transfer. A channel spatial hybrid self-attention mechanism is designed to strengthen the key information of the concerned image, and the common attention method is used to enhance the interaction of modal information, thus improving the quality of feature learning. To reconstruct the similarity relationship in the hash space, the transfer learning method is used to guide the generation of hash codes by using the real-valued space similarity. Comparative experiments are carried out on three commonly used datasets, MIRFLICKR-25K, IAPRTC-12 and MSCOCO, with excellent methods such as deep cross-modal hashing (DCMH), pairwise relationship guided deep hashing (PRDH) and cross-modal hamming hashing (CMHH). The results show that when the length of the hash code reaches 64 bit, the mean average precision (MAP) of the text retrieval task with image queries in three datasets is 72.3%, and that the MAP of the image retrieval task with text queries is 70%. These values are higher than those of other methods.

     

  • loading
  • [1]
    邵杰. 基于深度学习的跨模态检索[D]. 北京: 北京邮电大学, 2017: 10-13.

    SHAO J. Cross-modal retrieval based on deep learning [D]. Beijing University of Posts and Telecommunications, 2017: 10-13(in Chinese).
    [2]
    ZHANG Y, OU W, ZHANG J, et al. Category supervised cross-modal hashing retrieval for chest x-ray and radiology reports [J]. Computers and Electrical Engineering, 2022,98: 107673.
    [3]
    KAI L, QI G J, YE J, et al. Linear subspace ranking hashing for cross-modal retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(9): 1825-1838.
    [4]
    LI C, DENG C, LI N, et al. Self-supervised adversarial hashing networks for cross-modal retrieval[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4242-4251.
    [5]
    WU J, WENG W, FU J, et al. Deep semantic hashing with dual attention for cross-modal retrieval[J]. Neural Computing and Applications, 2022, 34(7): 5397-5416.
    [6]
    HUANG Z, WANG X, HUANG L, et al. CCNet: Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 603-612 .
    [7]
    LI X, WANG W, HU X, et al. Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 510-519.
    [8]
    YU C, WANG J, PENG C, et al. BiSeNet: Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision. Piscataway: IEEE Press, 2018: 325-341.
    [9]
    FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 3146-3154.
    [10]
    柳兴华, 曹桂涛, 林秋斌, 等. 自适应混合注意力深度跨模态哈希[J]. 计算机应用, 2022, 42(12): 3663-3670.

    LIU X H, CAO G T, LIN Q B, et al. Adaptive hybrid attention hashing for deep cross-modal retrieval[J]. Journal of Computer Applications, 2022, 42(12): 3663-3670(in Chinese).
    [11]
    WU G, LIN Z, HAN J, et al. Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence. New York: ACM, 2018: 2854-2860.
    [12]
    KAUR P, PANNU H S, MALHI A K. Comparative analysis on cross-modal information retrieval: A review[J]. Computer Science Review, 2021, 39(2): 100336.
    [13]
    LI C, DENG C, WANG L, et al. Coupled CycleGAN: Unsupervised hashing network for cross-modal retrieval[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2019, 33(1): 176-183.
    [14]
    CHENG S, WANG L, DU A. Deep semantic-preserving reconstruction hashing for unsupervised cross-modal retrieval[J]. Entropy, 2020, 22(11): 1266.
    [15]
    康培培, 林泽航, 杨振国, 等. 成对相似度迁移哈希用于无监督跨模态检索[J]. 计算机应用研究, 2021, 38(10): 3025-3029.

    KANG P P, LIN Z H, YANG Z G, et al. Pairwise similarity transferring hash for unsupervised cross-modal retrieval[J]. Application Research of Computers, 2021, 38(10): 3025-3029 (in Chinese).
    [16]
    WANG D, WANG Q , HE L. et al. Joint and individual matrix factorization hashing for large-scale cross-modal retrieval[J]. Pattern Recognition, 2020, 107: 107479.
    [17]
    GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: A survey[J]. Computational Visual Media, 2022, 8(3): 331-368. doi: 10.1007/s41095-022-0271-y
    [18]
    HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference On Computer Vision And Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141.
    [19]
    WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803.
    [20]
    GUO J, MA X, SANSOM A, et al. Spanet: Spatial pyramid attention network for enhanced image recognition[C]//Proceedings of the IEEE International Conference on Multimedia and Expo. Piscataway: IEEE Press, 2020: 1-6.
    [21]
    WOO S, PARK J, LEE J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Piscataway: IEEE Press, 2018: 3-19.
    [22]
    HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13713-13722.
    [23]
    JIANG Q Y, LI W J. Deep cross-modal hashing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 3232-3240.
    [24]
    YANG E, DENG C, LIU W, et al. Pairwise relationship guided deep hashing for cross-modal retrieval[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. New York: ACM, 2017: 1618-1625.
    [25]
    CAO Y, LIU B, LONG M, et al. Cross-modal hamming hashing[C]//Proceedings of the European Conference on Computer Vision. Piscataway: IEEE Press, 2018: 202-218.
    [26]
    WANG X, ZOU X, BAKKER E M, et al. Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval[J]. Neurocomputing, 2020, 400: 255-271. doi: 10.1016/j.neucom.2020.03.019
    [27]
    CAO Y , LONG M , WANG J , et al. Correlation hashing network for efficient cross-modal retrieval[C]//Proceedings of the British Machine Vision Conference 2017. London: British Machine Vision Association, 2017: 1-7.
    [28]
    LI C, DENG C, LI N, et al. Self-supervised adversarial hashing networks for cross-modal retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4242-4251.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(3)

    Article Metrics

    Article views(733) PDF downloads(7) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return