基于情感对象识别和情感规则的微博倾向性分析

王泽辰; 王树鹏; 孙立远; 张磊; 王勇; 郝冰川

doi:10.13700/j.bh.1001-5965.2020.0404

基于情感对象识别和情感规则的微博倾向性分析

doi: 10.13700/j.bh.1001-5965.2020.0404

1.
中国科学院信息工程研究所, 北京 100193
2.
国家计算机网络应急技术处理协调中心, 北京 100085

基金项目:

国家自然科学基金 61931019

详细信息

通讯作者:
王树鹏, E-mail: wangshupeng@iie.ac.cn

中图分类号: P391
计量
- 文章访问数: 253
- HTML全文浏览量: 34
- PDF下载量: 90
- 被引次数: 0
出版历程
- 收稿日期: 2020-08-09
- 录用日期: 2020-09-25
- 网络出版日期: 2022-02-20

Weibo tendency analysis based on sentimental object recognition and sentimental rules

1.
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100193, China
2.
National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100085, China

Funds:

National Natural Science Foundation of China 61931019

More Information

Corresponding author: WANG Shupeng, E-mail: wangshupeng@iie.ac.cn

摘要

摘要:
微博平台数据中含有大量反映用户情感喜恶的信息，对于涉及博文倾向性分析的应用尤为重要。现有的分析方法往往聚焦在博文情感的简单分类上，无法分析特定类型实体的微博倾向性。为解决微博倾向性分析问题，实现博文立场判定，采用半监督学习的方法，通过协同训练和主动学习，训练实体识别模型，并构建基于主成分分析的情感规则，提取句子的主成分，将口语化的文本规范化为指定格式。再利用指向性实体的正负面性、情感词的褒贬义及情感词充当的句子成分，实现情感分类的更深层次分析——立场判定。针对实际问题进行立场判定实验，在不同规模数据集上的自对比实验和他比实验显示，随着标注实体的博文数量增加，模型对博文立场判断的正确率持续提升，而且所提方法判断博文立场的正确率显著高于对比方法，相较已有研究方法分别提高了2.79%和10.00%。
- 情感分析 /
- 立场判定 /
- 半监督学习 /
- 倾向性 /
- 情感规则 /
- 协同训练 /
- 主动学习
Abstract:
Weibo contains a large number of information reflecting users' likes and dislikes, which is important for popular trend judgment, precision marketing, public opinion monitoring, etc. However, the existing methods tend to focus on the classification of Weibo sentiment. In order to solve the problem of Weibo tendentiousness analysis and position detection, we employ semisupervised learning method, through collaborative training and active learning. We train entity recognition models and combine deep learning with emotional rules. Moreover, the sentiment rules based on principal component analysis are constructed to extract the main components of sentences, normalize the spoken text into the specified format. Then we use the positive and negative aspects of directional entities, the positive and negative meanings of emotional words, and the sentence components of emotional words to judge the tendency of blog posts, and conduct deeper analysis on position classification. Finally, the self comparison experiment and other comparison experiment on different scale data sets show that with the increase of the number of blog posts of labeled entities, the accuracy of the model continues to improve, and the accuracy of this method is significantly higher than the comparison method, which is 2.79% and 10.00% higher than the existing research methods.
- sentiment analysis /
- position detection /
- semi-supervised learning /
- tendency /
- sentimental rule /
- co-training /
- active learning

HTML全文

图 1 OASOSR算法总体架构

Figure 1. Algorithm architecture of OASOSR

下载: 全尺寸图片幻灯片

图 2 情感对象实体集提取流程

Figure 2. Extraction flowchart of sentimental object entity sets

下载: 全尺寸图片幻灯片

图 3 OASOSR算法流程

Figure 3. OASOSR algorithm flowchart

下载: 全尺寸图片幻灯片

图 4 不同规模数据集上OASOSR算法立场判断正确率

Figure 4. Accuracy of standpoint judgement by OASOSR algorithm on different datasets

下载: 全尺寸图片幻灯片

图 5 基于不同模型筛选条件的微博立场判断正确率

Figure 5. Accuracy of Weibo standpoint judgement based on filtering conditions of different models

下载: 全尺寸图片幻灯片

表 1 微博情感分析方法的特点

Table 1. Features of Weibo sentiment analysis methods

情感分析方法	特点
语义词典	包含多个词典和句法规则库；利用知识库进行聚合计算；需要构建微博情感分析数据库
传统机器学习	构建特征向量；找到特征与分类结果间的关联；需要大量手工标注
深度学习	分词并将词语表示为词向量；深度神经网络提取语义信息；构建情感表征向量；需要较长训练时间

下载: 导出CSV

表 2 基于深度学习的情感分析方法特点对比

Table 2. Comparison of features of sentiment analysis methods based on deep learning

情感分析方法	特点
无监督学习	发掘文本数据中内在的词汇情感规律；无需人工标注
有监督学习	运行效率较高；出现分类错误时后续训练会受到影响
半监督学习	仅需要标记少量数据，适用于数据量大的任务

下载: 导出CSV

表 3 对比实验中选取的数据集

Table 3. Datasets selected for comparative experiment

数据类型	数据集	大小/条	数据格式
爬取数据	#新冠肺炎疫情#话题	38 175	标签(正向、负向)，文本
公开数据	weibo_senti_100k开源微博情感	119 989	标签(正向、负向)，文本

下载: 导出CSV

表 4 对比实验中选取的数据集大小

Table 4. Size of datasets selected for comparative experiment

数据集	实验数据条数
#新冠肺炎疫情#话题	485
weibo_senti_100k开源微博情感	4 000

下载: 导出CSV

表 5 基于不同模型的微博立场判断正确率

Table 5. Accuracy of Weibo standpoint judgement based on different models

方法	正确率/%
SCSVM	78.56
SAMPL	71.35
OASOSR	81.35

下载: 导出CSV

参考文献(23)

[1]	GIACHANOU A, MELE I, CRESTANI F. Explaining sentiment spikes in twitter[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. New York: ACM, 2016: 2263-2268.
[2]	王志涛, 於志文, 郭斌, 等. 基于词典和规则集的中文微博情感分析[J]. 计算机工程与应用, 2015, 51(8): 218-225. doi: 10.3778/j.issn.1002-8331.1308-0187 WANG Z T, YU Z W, GUO B, et al. Sentiment analysis of Chinese micro blog based on lexicon and rule set[J]. Computer Engineering and Applications, 2015, 51(8): 218-225(in Chinese). doi: 10.3778/j.issn.1002-8331.1308-0187
[3]	王灿伟. 基于主题提取的海量微博情感分析[J]. 南京大学学报(自然科学), 2017, 53(3): 549-556. https://www.cnki.com.cn/Article/CJFDTOTAL-NJDZ201703019.htm WANG C W. Sentimental analysis of massive micro-blog based on topic extraction[J]. Journal of Nanjing University (Natural Sciences), 2017, 53(3): 549-556(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-NJDZ201703019.htm
[4]	EBRAHIMI J, DOU D J, LOWD D. A joint sentiment-target-stance model for stance classification in tweets[C]//Proceedings of the 26th International Conference on Computational Linguistics, 2016: 2656-2665.
[5]	PAK A, PAROUBEK P. Twitter as a corpus for sentiment analysis and opinion mining[C]//Proceedings of International Conference on Language Resource and Evaluation, 2010: 13-20.
[6]	PANG B, LEE L, VAITHYANATHAN S, et al. Thumbs up : Sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 Conference on Empirical Methods on Natural Language Processing. New York: ACM, 2002: 79-86.
[7]	奠雨洁, 金琴, 吴慧敏. 基于多文本特征融合的中文微博的立场检测[J]. 计算机工程与应用, 2017, 53(21): 77-84. doi: 10.3778/j.issn.1002-8331.1702-0292 DIAN Y J, JIN Q, WU H M. Stance detection in Chinese microblogs via fusing multiple text features[J]. Computer Engineering and Applications, 2017, 53(21): 77-84(in Chinese). doi: 10.3778/j.issn.1002-8331.1702-0292
[8]	李俭兵, 刘栗材. 基于改进型神经网络的影评文本情感分析算法[J]. 计算机工程与科学, 2019, 41(12): 2261-2269. doi: 10.3969/j.issn.1007-130X.2019.12.023 LI J B, LIU S C. A film criticism sentiment analysis algorithm based on improved neural network[J]. Computer Engineering and Science, 2019, 41(12): 2261-2269(in Chinese). doi: 10.3969/j.issn.1007-130X.2019.12.023
[9]	LI D, QIAN J. Text sentiment analysis based on long and short term memory[C]//2016 First IEEE International Conference on Computer Communication and the Internet (ICCCI). Piscataway: IEEE Press, 2016: 471-475.
[10]	张仰森, 郑佳, 黄改娟, 等. 基于双重注意力模型的微博情感分析方法[J]. 清华大学学报(自然科学版), 2018, 58(2): 122-130. https://www.cnki.com.cn/Article/CJFDTOTAL-QHXB201802002.htm ZHANG Y S, ZHENG J, HUANG G J, et al. Microblog sentiment analysis method based on a double attention model[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(2): 122-130(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-QHXB201802002.htm
[11]	朱晓光, 聂培尧, 林培光. 基于监督学习的微博情感分类方法[J]. 计算机应用与软件, 2015, 32(8): 238-242. doi: 10.3969/j.issn.1000-386x.2015.08.057 ZHU X G, NIE P Y, LIN P G. Supervised learning based on microblogging sentiment classification method[J]. Computer Applications and Software, 2015, 32(8): 238-242(in Chinese). doi: 10.3969/j.issn.1000-386x.2015.08.057
[12]	段吉东, 刘双荣, 马坤, 等. 基于集成学习的文本情感分类方法[J]. 济南大学学报(自然科学版), 2019, 33(6): 483-488. https://www.cnki.com.cn/Article/CJFDTOTAL-SDJC201906001.htm DUAN J D, LIU S R, MA K, et al. Text sentiment classification method based on ensemble learning[J]. Journal of University of Jinan(Science and Technology), 2019, 33(6): 483-488(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-SDJC201906001.htm
[13]	TURNEY P D. Thumbs up or thumbs down : Semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002: 417-424.
[14]	BLOOM K, ARGAMON S. Automated learning of appraisal extraction patterns[J]. Language and Computers, 2010, 71(2): 249-260.
[15]	GUO J L, PENG J E, WANG H C. An opinion feature extraction approach based on a multidimensional sentence analysis model[J]. Cybernetics and Systems, 2013, 44(5): 379-401. doi: 10.1080/01969722.2013.789649
[16]	AGRAWAL A, XIE B, VOVSHA I, et al. Sentiment analysis of Twitter data[J]. International Journal of Computer Applications, 2013, 139(11): 880-887
[17]	CAMBRIA E, PORIA S, HAZARIKA D, et al. Senticnet5: Discovering conceptual primitives for sentiment analysis by means of context embeddings[C]//32nd AAAI Conference on Artificial Intelligence, 2018: 1795-1802.
[18]	DANDAPAT S. Handbook of natural language processing(second edition)[J]. Machine Translation, 2011, 25(4): 377-381. doi: 10.1007/s10590-011-9117-6
[19]	SINDHWANI V, MELVILLE P. Document-word co-regularization for semi-supervised sentiment analysis[C]//18th IEEE International Conference on Data Mining. Piscataway: IEEE Press, 2008: 1025-1030.
[20]	LIU Z, DONG X, GUAN Y, et al. Reserved self-training: A semi-supervised sentiment classification method for Chinese micro-blogs[C]//Proceedings of LJCNLP, 2013: 455-462.
[21]	SCUDDER H. Probability of error of some adaptive pattern-recognition machines[J]. IEEE Transactions on Information Theory, 1965, 11(3): 363-371. doi: 10.1109/TIT.1965.1053799
[22]	陈培文, 傅秀芬. 采用SVM方法的文本情感极性分类研究[J]. 广东工业大学学报, 2014, 31(3): 95-101. doi: 10.3969/j.issn.1007-7162.2014.03.017 CHEN P W, FU X F. Research on sentiment classification of texts based on SVM[J]. Journal of Guangdong University of Technology, 2014, 31(3): 95-101(in Chinese). doi: 10.3969/j.issn.1007-7162.2014.03.017
[23]	张成功, 刘培玉, 朱振方, 等. 一种基于极性词典的情感分析方法[J]. 山东大学学报, 2012, 47(3): 47-50. https://www.cnki.com.cn/Article/CJFDTOTAL-SDDX201203011.htm ZHANG C G, LIU P Y, ZHU Z F, et al. A sentiment analysis method based on a polarity lexicon[J]. Journal of Shandong University, 2012, 47(3): 47-50(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-SDDX201203011.htm