留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

联合方面注意力交互的图文方面类情感识别

赵一成 王素格 廖健 何东欢

赵一成,王素格,廖健,等. 联合方面注意力交互的图文方面类情感识别[J]. 北京航空航天大学学报,2024,50(2):569-578 doi: 10.13700/j.bh.1001-5965.2022.0387
引用本文: 赵一成,王素格,廖健,等. 联合方面注意力交互的图文方面类情感识别[J]. 北京航空航天大学学报,2024,50(2):569-578 doi: 10.13700/j.bh.1001-5965.2022.0387
ZHAO Y C,WANG S G,LIAO J,et al. Image-text aspect emotion recognition based on joint aspect attention interaction[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(2):569-578 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0387
Citation: ZHAO Y C,WANG S G,LIAO J,et al. Image-text aspect emotion recognition based on joint aspect attention interaction[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(2):569-578 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0387

联合方面注意力交互的图文方面类情感识别

doi: 10.13700/j.bh.1001-5965.2022.0387
基金项目: 国家自然科学基金(62076158,61906112);山西省太原市小店区科技局项目(2020XDCXY05)
详细信息
    通讯作者:

    E-mail:wsg@sxu.edu.cn

  • 中图分类号: TP391

Image-text aspect emotion recognition based on joint aspect attention interaction

Funds: National Natural Science Foundation of China (62076158,61906112); Project of Science and Technology Bureau of Xiaodian District, Taiyuan City, Shanxi Province (2020XDCXY05)
More Information
  • 摘要:

    随着多媒体的快速发展,单纯采用文本的方面类情感分析,不能准确识别用户所表达的情感。而现有图文数据的方面类情感分析方法仅考虑图文模态间的交互,忽略图文数据的不一致性和相关性。因此,提出联合方面注意力交互网络(JAAIN)模型的图文方面类情感识别方法。所提方法针对图文数据的不一致性与相关性,通过多层次融合方面信息和图文信息,去除与给定方面无关的文本和图像,增强给定方面的图文模态数据的情感表示,将文本数据情感表示、图像数据情感表示及方面类情感表示进行拼接融合与全连接,实现图文方面类情感判别。在数据集Multi-ZOL上进行实验,实验结果表明:所提模型能够提升图文方面类情感判别的性能。

     

  • 图 1  JAAIN模型整体框架

    Figure 1.  Overall framework of JAAIN model

    图 2  文本表示中Transformer-encoder模型层数影响

    Figure 2.  Influence of Transformer-encoder model layers in text representation

    图 3  捕获单模态的内部信息中Transformer-encoder层数影响

    Figure 3.  Effect of number of Transformer-encoder model layers in capturing internal information of a single modality

    图 4  方面直接交互机制对实例1可视化展示

    Figure 4.  A visual display of aspect direct interaction mechanism for example 1

    图 5  方面深层交互机制对实例2可视化展示

    Figure 5.  A visual display of aspect deep interaction mechanism for example 2

    图 6  方面深层交互机制对实例3可视化展示

    Figure 6.  A visual display of the aspect deep interaction mechanism for example 3

    表  1  图文数据不一致性和相关性样例

    Table  1.   Examples of image and text data inconsistencies and correlations

    实例 文本 图像 方面 情感标签
    1 手机挺好的,黑色挺酷,显示不错,还有全程的护眼模式,外观漂亮,运行流畅,就是输入法不太适应,电池一天一冲充肯定要的,就是在淘宝官方旗舰店买没有送耳机挺不开心的,心理不平衡!电池容量小,就这个缺点吧。 拍照效果 负面
    2 1) 外观低调、沉稳、内敛、商务感十足,深受男人喜爱; 2) 分屏、显示、色彩在这个价位上表现还是很不错的; 3) 提供摆拍姿势指导,支持前置柔光灯; 4) 手机续航不错,用一天妥妥的; 5) 支持多个知名品牌耳机音效。 拍照效果 正面
    3 配置高,性能强悍。顶级相机,成像质量佳。不支持快充,充电慢。后置摄像头是iPhoneSE的一大亮点,延续了iPhone6s的顶级配置,虽然没有光学防抖,但依旧让我十分兴奋。 拍照效果 正面
    下载: 导出CSV

    表  2  Multi-ZOL数据集统计

    Table  2.   Statistics of Multi-ZOL dataset

    属性 数值
    评论数 5228
    标签数 10
    每个评论的平均词数 315.11
    每个评论的最大词数 8511
    每个评论的最小词数 5
    每个评论的平均图像数 4.5
    每个评论的最大图像数 111
    每个评论的最小图像数 1
    下载: 导出CSV

    表  3  对比实验结果

    Table  3.   Comparative experimental results

    数据类型 模型 精确率/% F1/%
    文本 LSTM[20] 58.92 57.29
    MemNet[22] 59.51 58.73
    ATAE-LSTM[21] 59.58 58.95
    IAN[23] 60.08 59.47
    RAM[24] 60.18 59.68
    文本+图像 TomBERT[4] 59.35 58.40
    Co-Memory+Aspect[3] 60.43 59.74
    MIMN[2] 61.59 60.51
    EF-CapTrBERT[25] 65.67 64.99
    MIMN+BERT+CNN152 68.77 68.53
    JAAIN 74.57 74.48
    下载: 导出CSV

    表  4  消融实验对比结果

    Table  4.   Comparison results of ablation experiments %

    模型 精确率 F1
    JAAIN 74.57 74.48
    -JAAIN(image) 68.04 67.80
    -JAAIN(text) 39.32 30.36
    -DAIMA 72.21 72.20
    -ADIMA 73.20 73.09
    -Transformer-encoder 73.27 73.28
    下载: 导出CSV
  • [1] PONTIKI M, GALANIS D, PAVLOPOULOS J, et al. SemEval-2014 task 4: Aspect based sentiment analysis[C]//Proceedings of the 8th International Workshop on Semantic Evaluation. Stroudsburg: Association for Computational Linguistics, 2014: 27-35.
    [2] XU N, MAO W J, CHEN G D. Multi-interactive memory networkfor aspect based multimodal sentiment analysis[C]//Proceedings ofthe AAAI Conference on Artificial Intelligence. Washton, D.C.: AAAI, 2019, 33(1): 371-378.
    [3] XU N, MAO W J, CHEN G D. A co-memory network for multimodal sentiment analysis[C]//Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. New York: ACM, 2018: 929-932.
    [4] YU J F, JIANG J. Adapting BERT for target-oriented multimodal sentiment classification[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2019: 5408-5414.
    [5] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000–6010.
    [6] GU S Q, ZHANG L P, HOU Y X, et al. A position-aware bidirectional attention network for aspect-level sentiment analysis [C]//Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe: Curran Associates, Inc., 2018: 774–784.
    [7] XU Q N, ZHU L, DAI T, et al. Aspect-based sentiment classification with multi-attention network[J]. Neurocomputing, 2020, 388: 135-143. doi: 10.1016/j.neucom.2020.01.024
    [8] WU C, XIONG Q Y, GAO M, et al. A relative position attention network for aspect-based sentiment analysis[J]. Knowledge and Information Systems, 2021, 63(2): 333-347. doi: 10.1007/s10115-020-01512-w
    [9] LI Y, ZENG J B, SHAN S G, et al. Occlusion aware facial expression recognition using CNN with attention mechanism[J]. IEEE Transactions on Image Processing, 2019, 28: 2439-2450. doi: 10.1109/TIP.2018.2886767
    [10] XIE S Y, HU H F, WU Y B. Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition[J]. Pattern Recognition, 2019, 92: 177-191. doi: 10.1016/j.patcog.2019.03.019
    [11] ZHAO S C, GAO Y, JIANG X L, et al. Exploring principles-of-art features for image emotion recognition[C]//Proceedings of the 22nd ACM international conference on Multimedia. New York: ACM, 2014: 47-56.
    [12] RAO T R, LI X X, XU M. Learning multi-level deep representations for image emotion classification[J]. Neural Processing Letters, 2020, 51(3): 2043-2061. doi: 10.1007/s11063-019-10033-9
    [13] PORIA S, CHATURVEDI I, CAMBRIA E, et al. Convolutional MKL based multimodal emotion recognition and sentiment analysis[C]//Proceedings of the 2016 IEEE 16th International Conference on Data Mining. Piscataway: IEEE Press, 2016: 439-448.
    [14] CAO D L, JI R R, LIN D Z, et al. A cross-media public sentiment analysis system for microblog[J]. Multimedia Systems, 2016, 22(4): 479-486. doi: 10.1007/s00530-014-0407-8
    [15] TRUONG Q T, LAUW H W. VistaNet: Visual aspect attentionnet-work for multimodal sentiment analysis[C]//Proceedings of theAAAI Conference on Artificial Intelligence. Washton, D.C.: AAAI, 2019, 33(1): 305-312.
    [16] XU J, HUANG F R, ZHANG X M, et al. Visual-textual sentiment classification with bi-directional multi-level attention networks[J]. Knowledge-Based Systems, 2019, 178: 61-73. doi: 10.1016/j.knosys.2019.04.018
    [17] LU J S, BATRA D, PARIKH D, et al. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[EB/OL]. (2019-08-06) [2022-01-03]. https://arxiv.org/abs/1908.02265.
    [18] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: Association for Computational Linguistics, 2019: 4171-4186.
    [19] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. Computer Science, 2012, 3(4): 212-223.
    [20] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. doi: 10.1162/neco.1997.9.8.1735
    [21] WANG Y Q, HUANG M L, ZHU X Y, et al. Attention-based LSTM for aspect-level sentiment classification[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2016: 606-615.
    [22] TANG D Y, QIN B, LIU T. Aspect level sentiment classification with deep memory network[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2016: 214-224.
    [23] MA D H, LI S J, ZHANG X D, et al. Interactive attention networks for aspect-level sentiment classification[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. New York: ACM, 2017: 4068-4074.
    [24] CHEN P, SUN Z Q, BING L D, et al. Recurrent attention network on memory for aspect sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 452-461.
    [25] KHAN Z, FU Y. Exploiting BERT for multimodal target sentiment classification through input space translation[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 3034-3042.
  • 加载中
图(6) / 表(4)
计量
  • 文章访问数:  975
  • HTML全文浏览量:  92
  • PDF下载量:  17
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-05-19
  • 录用日期:  2022-07-02
  • 网络出版日期:  2022-10-18
  • 整期出版日期:  2024-02-27

目录

    /

    返回文章
    返回
    常见问答