联合方面注意力交互的图文方面类情感识别

赵一成; 王素格; 廖健; 何东欢

doi:10.13700/j.bh.1001-5965.2022.0387

联合方面注意力交互的图文方面类情感识别

doi: 10.13700/j.bh.1001-5965.2022.0387

赵一成¹,
王素格^{1, 2, ,},
廖健¹,
何东欢¹

1.
山西大学计算机与信息技术学院，太原 030006
2.
山西大学计算智能与中文信息处理教育部重点实验室，太原 030006

基金项目: 国家自然科学基金(62076158,61906112)；山西省太原市小店区科技局项目(2020XDCXY05)

详细信息

通讯作者:
E-mail：wsg@sxu.edu.cn

中图分类号: TP391
计量
- 文章访问数: 975
- HTML全文浏览量: 92
- PDF下载量: 17
- 被引次数: 0
出版历程
- 收稿日期: 2022-05-19
- 录用日期: 2022-07-02
- 网络出版日期: 2022-10-18
- 整期出版日期: 2024-02-27

Image-text aspect emotion recognition based on joint aspect attention interaction

1.
School of Computer & Information Technology，Shanxi University，Taiyuan 030006，China
2.
Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education，Shanxi University，Taiyuan 030006，China

Funds: National Natural Science Foundation of China (62076158,61906112); Project of Science and Technology Bureau of Xiaodian District, Taiyuan City, Shanxi Province (2020XDCXY05)

More Information

Corresponding author: E-mail：wsg@sxu.edu.cn

摘要

摘要:
随着多媒体的快速发展，单纯采用文本的方面类情感分析，不能准确识别用户所表达的情感。而现有图文数据的方面类情感分析方法仅考虑图文模态间的交互，忽略图文数据的不一致性和相关性。因此，提出联合方面注意力交互网络(JAAIN)模型的图文方面类情感识别方法。所提方法针对图文数据的不一致性与相关性，通过多层次融合方面信息和图文信息，去除与给定方面无关的文本和图像，增强给定方面的图文模态数据的情感表示，将文本数据情感表示、图像数据情感表示及方面类情感表示进行拼接融合与全连接，实现图文方面类情感判别。在数据集Multi-ZOL上进行实验，实验结果表明：所提模型能够提升图文方面类情感判别的性能。
- 方面类情感分析 /
- 注意力机制 /
- 多模态情感分析 /
- 情感表示 /
- 多模态融合
Abstract:
Due to the quick development of social media, the sentiment conveyed by users cannot be reliably identified by an Aspect-Category Sentiment Analysis of the text alone. However, the existing Aspect-Category Sentiment Analysis methods for image and text data only consider the interaction between image and text modalities, ignoring the inconsistency and correlation of image and text data. Therefore, this paper proposes a joint aspect attention interaction network (JAAIN) model for aspect-category sentiment identification. The suggested technique improves the representation of image and text modalities in particular aspects by multi-level aspect, image, and text information fusion. It does this by removing the text and images that are unrelated to certain aspects. The text data sentiment representation, image data sentiment representation and aspect category sentiment representation are concatenated, fused and fully connected to realize sentiment discrimination of image and text aspects. The experimental results show that the proposed model can improve the performance of sentiment identification in images and text on the Multi-ZOL Dataset.
- aspect-category sentiment analysis /
- attention mechanism /
- multi-modal sentiment analysis /
- sentiment expression /
- multi-modal fusion

HTML全文

图 1 JAAIN模型整体框架

Figure 1. Overall framework of JAAIN model

下载: 全尺寸图片幻灯片

图 2 文本表示中Transformer-encoder模型层数影响

Figure 2. Influence of Transformer-encoder model layers in text representation

下载: 全尺寸图片幻灯片

图 3 捕获单模态的内部信息中Transformer-encoder层数影响

Figure 3. Effect of number of Transformer-encoder model layers in capturing internal information of a single modality

下载: 全尺寸图片幻灯片

图 4 方面直接交互机制对实例1可视化展示

Figure 4. A visual display of aspect direct interaction mechanism for example 1

下载: 全尺寸图片幻灯片

图 5 方面深层交互机制对实例2可视化展示

Figure 5. A visual display of aspect deep interaction mechanism for example 2

下载: 全尺寸图片幻灯片

图 6 方面深层交互机制对实例3可视化展示

Figure 6. A visual display of the aspect deep interaction mechanism for example 3

下载: 全尺寸图片幻灯片

表 1 图文数据不一致性和相关性样例

Table 1. Examples of image and text data inconsistencies and correlations

实例	文本	图像	方面	情感标签
1	手机挺好的，黑色挺酷，显示不错，还有全程的护眼模式，外观漂亮，运行流畅，就是输入法不太适应，电池一天一冲充肯定要的，就是在淘宝官方旗舰店买没有送耳机挺不开心的，心理不平衡！电池容量小，就这个缺点吧。		拍照效果	负面
2	1）外观低调、沉稳、内敛、商务感十足，深受男人喜爱； 2）分屏、显示、色彩在这个价位上表现还是很不错的； 3）提供摆拍姿势指导，支持前置柔光灯； 4）手机续航不错，用一天妥妥的； 5）支持多个知名品牌耳机音效。		拍照效果	正面
3	配置高，性能强悍。顶级相机，成像质量佳。不支持快充，充电慢。后置摄像头是iPhoneSE的一大亮点，延续了iPhone6s的顶级配置，虽然没有光学防抖，但依旧让我十分兴奋。		拍照效果	正面

下载: 导出CSV

表 2 Multi-ZOL数据集统计

Table 2. Statistics of Multi-ZOL dataset

属性	数值
评论数	5228
标签数	10
每个评论的平均词数	315.11
每个评论的最大词数	8511
每个评论的最小词数	5
每个评论的平均图像数	4.5
每个评论的最大图像数	111
每个评论的最小图像数	1

下载: 导出CSV

表 3 对比实验结果

Table 3. Comparative experimental results

数据类型	模型	精确率/%	F₁/%
文本	LSTM^[20]	58.92	57.29
	MemNet^[22]	59.51	58.73
	ATAE-LSTM^[21]	59.58	58.95
	IAN^[23]	60.08	59.47
	RAM^[24]	60.18	59.68
文本+图像	TomBERT^[4]	59.35	58.40
	Co-Memory+Aspect^[3]	60.43	59.74
	MIMN^[2]	61.59	60.51
	EF-CapTrBERT^[25]	65.67	64.99
	MIMN+BERT+CNN152	68.77	68.53
	JAAIN	74.57	74.48

下载: 导出CSV

表 4 消融实验对比结果

Table 4. Comparison results of ablation experiments %

模型	精确率	F₁
JAAIN	74.57	74.48
-JAAIN(image)	68.04	67.80
-JAAIN(text)	39.32	30.36
-DAIMA	72.21	72.20
-ADIMA	73.20	73.09
-Transformer-encoder	73.27	73.28

下载: 导出CSV

参考文献(25)

[1]	PONTIKI M, GALANIS D, PAVLOPOULOS J, et al. SemEval-2014 task 4: Aspect based sentiment analysis[C]//Proceedings of the 8th International Workshop on Semantic Evaluation. Stroudsburg: Association for Computational Linguistics, 2014: 27-35.
[2]	XU N, MAO W J, CHEN G D. Multi-interactive memory networkfor aspect based multimodal sentiment analysis[C]//Proceedings ofthe AAAI Conference on Artificial Intelligence. Washton, D.C.: AAAI, 2019, 33(1): 371-378.
[3]	XU N, MAO W J, CHEN G D. A co-memory network for multimodal sentiment analysis[C]//Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. New York: ACM, 2018: 929-932.
[4]	YU J F, JIANG J. Adapting BERT for target-oriented multimodal sentiment classification[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2019: 5408-5414.
[5]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000–6010.
[6]	GU S Q, ZHANG L P, HOU Y X, et al. A position-aware bidirectional attention network for aspect-level sentiment analysis [C]//Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe: Curran Associates, Inc., 2018: 774–784.
[7]	XU Q N, ZHU L, DAI T, et al. Aspect-based sentiment classification with multi-attention network[J]. Neurocomputing, 2020, 388: 135-143. doi: 10.1016/j.neucom.2020.01.024
[8]	WU C, XIONG Q Y, GAO M, et al. A relative position attention network for aspect-based sentiment analysis[J]. Knowledge and Information Systems, 2021, 63(2): 333-347. doi: 10.1007/s10115-020-01512-w
[9]	LI Y, ZENG J B, SHAN S G, et al. Occlusion aware facial expression recognition using CNN with attention mechanism[J]. IEEE Transactions on Image Processing, 2019, 28: 2439-2450. doi: 10.1109/TIP.2018.2886767
[10]	XIE S Y, HU H F, WU Y B. Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition[J]. Pattern Recognition, 2019, 92: 177-191. doi: 10.1016/j.patcog.2019.03.019
[11]	ZHAO S C, GAO Y, JIANG X L, et al. Exploring principles-of-art features for image emotion recognition[C]//Proceedings of the 22nd ACM international conference on Multimedia. New York: ACM, 2014: 47-56.
[12]	RAO T R, LI X X, XU M. Learning multi-level deep representations for image emotion classification[J]. Neural Processing Letters, 2020, 51(3): 2043-2061. doi: 10.1007/s11063-019-10033-9
[13]	PORIA S, CHATURVEDI I, CAMBRIA E, et al. Convolutional MKL based multimodal emotion recognition and sentiment analysis[C]//Proceedings of the 2016 IEEE 16th International Conference on Data Mining. Piscataway: IEEE Press, 2016: 439-448.
[14]	CAO D L, JI R R, LIN D Z, et al. A cross-media public sentiment analysis system for microblog[J]. Multimedia Systems, 2016, 22(4): 479-486. doi: 10.1007/s00530-014-0407-8
[15]	TRUONG Q T, LAUW H W. VistaNet: Visual aspect attentionnet-work for multimodal sentiment analysis[C]//Proceedings of theAAAI Conference on Artificial Intelligence. Washton, D.C.: AAAI, 2019, 33(1): 305-312.
[16]	XU J, HUANG F R, ZHANG X M, et al. Visual-textual sentiment classification with bi-directional multi-level attention networks[J]. Knowledge-Based Systems, 2019, 178: 61-73. doi: 10.1016/j.knosys.2019.04.018
[17]	LU J S, BATRA D, PARIKH D, et al. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[EB/OL]. (2019-08-06) [2022-01-03]. https://arxiv.org/abs/1908.02265.
[18]	DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: Association for Computational Linguistics, 2019: 4171-4186.
[19]	HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. Computer Science, 2012, 3(4): 212-223.
[20]	HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. doi: 10.1162/neco.1997.9.8.1735
[21]	WANG Y Q, HUANG M L, ZHU X Y, et al. Attention-based LSTM for aspect-level sentiment classification[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2016: 606-615.
[22]	TANG D Y, QIN B, LIU T. Aspect level sentiment classification with deep memory network[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2016: 214-224.
[23]	MA D H, LI S J, ZHANG X D, et al. Interactive attention networks for aspect-level sentiment classification[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. New York: ACM, 2017: 4068-4074.
[24]	CHEN P, SUN Z Q, BING L D, et al. Recurrent attention network on memory for aspect sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 452-461.
[25]	KHAN Z, FU Y. Exploiting BERT for multimodal target sentiment classification through input space translation[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 3034-3042.