Weibo tendency analysis based on sentimental object recognition and sentimental rules
-
摘要:
微博平台数据中含有大量反映用户情感喜恶的信息,对于涉及博文倾向性分析的应用尤为重要。现有的分析方法往往聚焦在博文情感的简单分类上,无法分析特定类型实体的微博倾向性。为解决微博倾向性分析问题,实现博文立场判定,采用半监督学习的方法,通过协同训练和主动学习,训练实体识别模型,并构建基于主成分分析的情感规则,提取句子的主成分,将口语化的文本规范化为指定格式。再利用指向性实体的正负面性、情感词的褒贬义及情感词充当的句子成分,实现情感分类的更深层次分析——立场判定。针对实际问题进行立场判定实验,在不同规模数据集上的自对比实验和他比实验显示,随着标注实体的博文数量增加,模型对博文立场判断的正确率持续提升,而且所提方法判断博文立场的正确率显著高于对比方法,相较已有研究方法分别提高了2.79%和10.00%。
Abstract:Weibo contains a large number of information reflecting users' likes and dislikes, which is important for popular trend judgment, precision marketing, public opinion monitoring, etc. However, the existing methods tend to focus on the classification of Weibo sentiment. In order to solve the problem of Weibo tendentiousness analysis and position detection, we employ semisupervised learning method, through collaborative training and active learning. We train entity recognition models and combine deep learning with emotional rules. Moreover, the sentiment rules based on principal component analysis are constructed to extract the main components of sentences, normalize the spoken text into the specified format. Then we use the positive and negative aspects of directional entities, the positive and negative meanings of emotional words, and the sentence components of emotional words to judge the tendency of blog posts, and conduct deeper analysis on position classification. Finally, the self comparison experiment and other comparison experiment on different scale data sets show that with the increase of the number of blog posts of labeled entities, the accuracy of the model continues to improve, and the accuracy of this method is significantly higher than the comparison method, which is 2.79% and 10.00% higher than the existing research methods.
-
Key words:
- sentiment analysis /
- position detection /
- semi-supervised learning /
- tendency /
- sentimental rule /
- co-training /
- active learning
-
表 1 微博情感分析方法的特点
Table 1. Features of Weibo sentiment analysis methods
情感分析方法 特点 语义词典 包含多个词典和句法规则库;利用知识库进行聚合计算;需要构建微博情感分析数据库 传统机器学习 构建特征向量;找到特征与分类结果间的关联;需要大量手工标注 深度学习 分词并将词语表示为词向量;深度神经网络提取语义信息;构建情感表征向量;需要较长训练时间 表 2 基于深度学习的情感分析方法特点对比
Table 2. Comparison of features of sentiment analysis methods based on deep learning
情感分析方法 特点 无监督学习 发掘文本数据中内在的词汇情感规律;无需人工标注 有监督学习 运行效率较高;出现分类错误时后续训练会受到影响 半监督学习 仅需要标记少量数据,适用于数据量大的任务 表 3 对比实验中选取的数据集
Table 3. Datasets selected for comparative experiment
数据类型 数据集 大小/条 数据格式 爬取数据 #新冠肺炎疫情#话题 38 175 标签(正向、负向),文本 公开数据 weibo_senti_100k开源微博情感 119 989 标签(正向、负向),文本 表 4 对比实验中选取的数据集大小
Table 4. Size of datasets selected for comparative experiment
数据集 实验数据条数 #新冠肺炎疫情#话题 485 weibo_senti_100k开源微博情感 4 000 表 5 基于不同模型的微博立场判断正确率
Table 5. Accuracy of Weibo standpoint judgement based on different models
方法 正确率/% SCSVM 78.56 SAMPL 71.35 OASOSR 81.35 -
[1] GIACHANOU A, MELE I, CRESTANI F. Explaining sentiment spikes in twitter[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. New York: ACM, 2016: 2263-2268. [2] 王志涛, 於志文, 郭斌, 等. 基于词典和规则集的中文微博情感分析[J]. 计算机工程与应用, 2015, 51(8): 218-225. doi: 10.3778/j.issn.1002-8331.1308-0187WANG Z T, YU Z W, GUO B, et al. Sentiment analysis of Chinese micro blog based on lexicon and rule set[J]. Computer Engineering and Applications, 2015, 51(8): 218-225(in Chinese). doi: 10.3778/j.issn.1002-8331.1308-0187 [3] 王灿伟. 基于主题提取的海量微博情感分析[J]. 南京大学学报(自然科学), 2017, 53(3): 549-556. https://www.cnki.com.cn/Article/CJFDTOTAL-NJDZ201703019.htmWANG C W. Sentimental analysis of massive micro-blog based on topic extraction[J]. Journal of Nanjing University (Natural Sciences), 2017, 53(3): 549-556(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-NJDZ201703019.htm [4] EBRAHIMI J, DOU D J, LOWD D. A joint sentiment-target-stance model for stance classification in tweets[C]//Proceedings of the 26th International Conference on Computational Linguistics, 2016: 2656-2665. [5] PAK A, PAROUBEK P. Twitter as a corpus for sentiment analysis and opinion mining[C]//Proceedings of International Conference on Language Resource and Evaluation, 2010: 13-20. [6] PANG B, LEE L, VAITHYANATHAN S, et al. Thumbs up : Sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 Conference on Empirical Methods on Natural Language Processing. New York: ACM, 2002: 79-86. [7] 奠雨洁, 金琴, 吴慧敏. 基于多文本特征融合的中文微博的立场检测[J]. 计算机工程与应用, 2017, 53(21): 77-84. doi: 10.3778/j.issn.1002-8331.1702-0292DIAN Y J, JIN Q, WU H M. Stance detection in Chinese microblogs via fusing multiple text features[J]. Computer Engineering and Applications, 2017, 53(21): 77-84(in Chinese). doi: 10.3778/j.issn.1002-8331.1702-0292 [8] 李俭兵, 刘栗材. 基于改进型神经网络的影评文本情感分析算法[J]. 计算机工程与科学, 2019, 41(12): 2261-2269. doi: 10.3969/j.issn.1007-130X.2019.12.023LI J B, LIU S C. A film criticism sentiment analysis algorithm based on improved neural network[J]. Computer Engineering and Science, 2019, 41(12): 2261-2269(in Chinese). doi: 10.3969/j.issn.1007-130X.2019.12.023 [9] LI D, QIAN J. Text sentiment analysis based on long and short term memory[C]//2016 First IEEE International Conference on Computer Communication and the Internet (ICCCI). Piscataway: IEEE Press, 2016: 471-475. [10] 张仰森, 郑佳, 黄改娟, 等. 基于双重注意力模型的微博情感分析方法[J]. 清华大学学报(自然科学版), 2018, 58(2): 122-130. https://www.cnki.com.cn/Article/CJFDTOTAL-QHXB201802002.htmZHANG Y S, ZHENG J, HUANG G J, et al. Microblog sentiment analysis method based on a double attention model[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(2): 122-130(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-QHXB201802002.htm [11] 朱晓光, 聂培尧, 林培光. 基于监督学习的微博情感分类方法[J]. 计算机应用与软件, 2015, 32(8): 238-242. doi: 10.3969/j.issn.1000-386x.2015.08.057ZHU X G, NIE P Y, LIN P G. Supervised learning based on microblogging sentiment classification method[J]. Computer Applications and Software, 2015, 32(8): 238-242(in Chinese). doi: 10.3969/j.issn.1000-386x.2015.08.057 [12] 段吉东, 刘双荣, 马坤, 等. 基于集成学习的文本情感分类方法[J]. 济南大学学报(自然科学版), 2019, 33(6): 483-488. https://www.cnki.com.cn/Article/CJFDTOTAL-SDJC201906001.htmDUAN J D, LIU S R, MA K, et al. Text sentiment classification method based on ensemble learning[J]. Journal of University of Jinan(Science and Technology), 2019, 33(6): 483-488(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-SDJC201906001.htm [13] TURNEY P D. Thumbs up or thumbs down : Semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002: 417-424. [14] BLOOM K, ARGAMON S. Automated learning of appraisal extraction patterns[J]. Language and Computers, 2010, 71(2): 249-260. [15] GUO J L, PENG J E, WANG H C. An opinion feature extraction approach based on a multidimensional sentence analysis model[J]. Cybernetics and Systems, 2013, 44(5): 379-401. doi: 10.1080/01969722.2013.789649 [16] AGRAWAL A, XIE B, VOVSHA I, et al. Sentiment analysis of Twitter data[J]. International Journal of Computer Applications, 2013, 139(11): 880-887 [17] CAMBRIA E, PORIA S, HAZARIKA D, et al. Senticnet5: Discovering conceptual primitives for sentiment analysis by means of context embeddings[C]//32nd AAAI Conference on Artificial Intelligence, 2018: 1795-1802. [18] DANDAPAT S. Handbook of natural language processing(second edition)[J]. Machine Translation, 2011, 25(4): 377-381. doi: 10.1007/s10590-011-9117-6 [19] SINDHWANI V, MELVILLE P. Document-word co-regularization for semi-supervised sentiment analysis[C]//18th IEEE International Conference on Data Mining. Piscataway: IEEE Press, 2008: 1025-1030. [20] LIU Z, DONG X, GUAN Y, et al. Reserved self-training: A semi-supervised sentiment classification method for Chinese micro-blogs[C]//Proceedings of LJCNLP, 2013: 455-462. [21] SCUDDER H. Probability of error of some adaptive pattern-recognition machines[J]. IEEE Transactions on Information Theory, 1965, 11(3): 363-371. doi: 10.1109/TIT.1965.1053799 [22] 陈培文, 傅秀芬. 采用SVM方法的文本情感极性分类研究[J]. 广东工业大学学报, 2014, 31(3): 95-101. doi: 10.3969/j.issn.1007-7162.2014.03.017CHEN P W, FU X F. Research on sentiment classification of texts based on SVM[J]. Journal of Guangdong University of Technology, 2014, 31(3): 95-101(in Chinese). doi: 10.3969/j.issn.1007-7162.2014.03.017 [23] 张成功, 刘培玉, 朱振方, 等. 一种基于极性词典的情感分析方法[J]. 山东大学学报, 2012, 47(3): 47-50. https://www.cnki.com.cn/Article/CJFDTOTAL-SDDX201203011.htmZHANG C G, LIU P Y, ZHU Z F, et al. A sentiment analysis method based on a polarity lexicon[J]. Journal of Shandong University, 2012, 47(3): 47-50(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-SDDX201203011.htm