-
摘要:
社交平台允许用户采用多种信息模态发表意见与观点,多模态语义信息融合能够更有效地预测用户所表达的情感倾向。因此,多模态情感分析近年来受到了广泛关注。然而,多模态情感分析中视觉与文本存在的语义无关问题,导致情感分析效果不佳。针对这一问题,提出了基于语义相关的多模态社交情感分析(MSSA-SC)方法。采用图文语义相关性分类模型,对图文社交信息进行语义相关性识别,若图文语义相关,则对图文社交信息使用图文语义对齐多模态模型进行图文特征融合的情感分析;若图文语义无关,则仅对文本模态进行情感分析。在真实社交媒体数据集上进行了实验,由实验结果可知,所提方法能够有效降低图文语义无关情况对多模态社交媒体情感分析的影响。与此同时,所提方法的Accuracy和Macro-F1指标分别为75.23%和70.18%,均高于基准模型。
Abstract:Social platforms allow users to express opinions in a variety of information modalities, and multi-modal semantic information fusion can more effectively predict the emotional tendencies expressed by users. Therefore, multimodal sentiment analysis has received extensive attention in recent years. However, in multi-modal sentiment analysis, there is a problem of unrelated semantics between vision and text, resulting in poor sentiment analysis. In order to solve this problem, this paper proposes the Multimodal Social Sentiment Analysis based on Semantic Correlation (MSSA-SC) method. The MSSA-SC firstly adopts the semantic relevance classification model of image and text to identify the semantic relevance of the image-text social media. If the image and text are semantically related, the image and text semantic alignment multimodal model is used for the image-text feature fusion for the image-text social media sentiment analysis. When the image and text semantics are irrelevant, only the sentiment analysis is performed on the text modality. The experimental results on real social media datasets show that the MSSA-SC method can effectively reduce the influence of unrelated image and text semantics on multimodal social sentiment analysis. Moreover, the Accuracy and Macro-F1 of the MSSA-SC method are 75.23% and 70.18%, respectively, and outperform those of the benchmark model.
-
Key words:
- multimodal /
- social media /
- sentiment analysis /
- semantic correlation /
- image-text feature fusion
-
表 1 图文语义相关性数据集的数据分布
Table 1. Data distribution of image-text semantic correlation datasets
标签 微博图文对数 P 7 442 N 2 375 表 2 微博图文情感分析数据集的数据分布
Table 2. Data distribution of image-text microblog sentiment analysis datasets
标签 测试样例 训练样例 积极 981 4 660 中性 2 404 9 567 消极 615 1 903 表 3 实验参数设置
Table 3. Experimental parameter setting
参数 数值 最大句子长度 256 批处理个数 12 学习率 2×10-5 自注意力机制头数 6 预热学习率 0.1 全部样本训练次数 4.0 表 4 微博图文情感分类实验结果
Table 4. Experimental results of image-text microblog sentiment classification
模态 方法 Accuracy/% Macro-F1/% 图像 ResNet 58.91 45.27 文本 BiLSTM 70.92 63.48 BERT 73.25 68.43 RoBERTa 73.77 69.59 图文 Multi-CNN 68.31 62.46 MBERT 74.52 68.70 SAMRoBERTa 74.70 69.65 MSSA-SC 75.23 70.18 表 5 图文情感分析数据集中的图文数据实例
Table 5. Image-text samples from image-text sentiment analysis dataset
图文数据实例 实际情感标签 SRMRoBERTa图文语义相关性分类 消极 图文语义不相关 -
[1] 王英, 龚花萍. 基于情感维度的大数据网络舆情情感倾向性分析研究-以"南昌大学自主保洁"微博舆情事件为例[J]. 情报科学, 2017, 35(4): 40-45. https://www.cnki.com.cn/Article/CJFDTOTAL-QBKX201704007.htmWANG Y, GONG H P. Research on the sentiment tendency of big data network public opinion sentiment based on the sentiment dimension-Taking the "Nanchang University Independent Cleaning" Weibo public opinion event as an example[J]. Information Science, 2017, 35(4): 40-45(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-QBKX201704007.htm [2] SENG J K, ANG K L. Multimodal emotion and sentiment modeling from unstructured big data: Challenges, architecture & techniques[J]. IEEE Access, 2019, 7: 90982-90998. doi: 10.1109/ACCESS.2019.2926751 [3] GHOSAL D, AKHTAR S, CHAUHAN D, et al. Contextual inter-modal attention for multi-modal sentiment analysis[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL Press, 2018: 3454-3466. [4] NIE L, WANG W, HONG R, et al. Multimodal dialog system: Generating responses via adaptive decoders[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York: ACM Press, 2019: 1098-1106. [5] LIU Y, OTT M, GOYAL N, et al. RoBERTa: A robustly optimized BERT pretraining approach[EB/OL]. (2019-07-26)[2020-08-01]. https://arxiv.org/abs/1907.11692. [6] LIU M, ZHANG L, LIU Y, et al. Recognizing semantic correlation in image-text Weibo via feature space mapping[J]. Computer Vision and Image Understanding, 2017, 163: 58-66. doi: 10.1016/j.cviu.2017.04.012 [7] XIAO Z, LIANG P. Chinese sentiment analysis using bidirectional LSTM with word embedding[C]//International Conference on Cloud Computing and Security. Berlin: Springer, 2016: 601-610. [8] ZHENG J, CHEN X, DU Y, et al. Short text sentiment analysis of micro-blog based on BERT[C]//MUE 2019, FutureTech 2019: Advanced Multimedia and Ubiquitous Engineering. Berlin: Springer, 2019: 390-396. [9] DEVLIN J, CHANG M, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[EB/OL]. (2018-10-11)[2020-08-01]. https: //arxiv.org/abs/1810.04805. [10] MACHAJDIK J, HANBURY A. Affective image classification using features inspired by psychology and art theory[C]//Proceedings of the 18th ACM International Conference on Multimedia. New York: ACM Press, 2010: 83-92. [11] JINDAL S, SINGH S. Image sentiment analysis using deep convolutional neural networks with domain specific fine tuning[C]//2015 International Conference on Information Processing. Piscataway: IEEE Press, 2015: 447-451. [12] YOU Q, LUO J, JIN H, et al. Joint visual-textual sentiment analysis with deep neural networks[C]//Proceedings of the 23rd ACM International Conference on Multimedia. New York: ACM Press, 2015: 1071-1074. [13] ZHU X, CAO B, XU S, et al. Joint visual-textual sentiment analysis based on cross-modality attention mechanism[C]//Proceedings of the International Conference on Multimedia Modeling. Berlin: Springer, 2019: 264-276. [14] 缪裕青, 汪俊宏, 刘同来, 等. 图文融合的微博情感分析方法[J]. 计算机工程与设计, 2019, 40(4): 1099-1105. https://www.cnki.com.cn/Article/CJFDTOTAL-SJSJ201904032.htmMIAO Y Q, WANG J H, LIU T L, et al. Joint visual-textual approach for microblog sentiment analysis[J]. Computer Engineering and Design, 2019, 40(4): 1099-1105(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-SJSJ201904032.htm [15] ZHAO Z Y, ZHU H Y, XUE Z H, et al. An image-text consistency driven multimodal sentiment analysis approach for social media[J]. Information Processing and Management, 2019, 56(6): 102097. doi: 10.1016/j.ipm.2019.102097 [16] 蔡国永, 吕光瑞, 徐智. 基于层次化深度关联融合网络的社交媒体情感分类[J]. 计算机研究与发展, 2019, 56(6): 1312-1324. https://www.cnki.com.cn/Article/CJFDTOTAL-JFYZ201906019.htmCAI G Y, LV G R, XU Z. A hierarchical deep correlation fusion network for sentiment classification in social media[J]. Journal of Computer Research and Development, 2019, 56(6): 1312-1324(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JFYZ201906019.htm [17] LI W, ZHAO J. TextRank algorithm by exploiting Wikipedia for short text keywords extraction[C]//2016 3rd International Conference on Information Science and Control Engineering. Piscataway: IEEE Press, 2016: 683-686. [18] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778. [19] CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1251-1258. [20] MIKOLOV T, CHEN K, CORRADO G S, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-01-16)[2020-08-01]. https://arxiv.org/abs/1301.3781. [21] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 5998-6008. [22] CAI G, XIA B. Convolutional neural networks for multimedia sentiment analysis[C]//NLPCC 2015: Natural Language Processing and Chinese Computing. Berlin: Springer, 2015: 159-167. [23] YU J, JIANG J. Adapting BERT for target-oriented multimodal sentiment classification[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann, 2019: 5408-5414.