留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于情感对象识别和情感规则的微博倾向性分析

王泽辰 王树鹏 孙立远 张磊 王勇 郝冰川

王泽辰, 王树鹏, 孙立远, 等 . 基于情感对象识别和情感规则的微博倾向性分析[J]. 北京航空航天大学学报, 2022, 48(2): 301-310. doi: 10.13700/j.bh.1001-5965.2020.0404
引用本文: 王泽辰, 王树鹏, 孙立远, 等 . 基于情感对象识别和情感规则的微博倾向性分析[J]. 北京航空航天大学学报, 2022, 48(2): 301-310. doi: 10.13700/j.bh.1001-5965.2020.0404
WANG Zechen, WANG Shupeng, SUN Liyuan, et al. Weibo tendency analysis based on sentimental object recognition and sentimental rules[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2): 301-310. doi: 10.13700/j.bh.1001-5965.2020.0404(in Chinese)
Citation: WANG Zechen, WANG Shupeng, SUN Liyuan, et al. Weibo tendency analysis based on sentimental object recognition and sentimental rules[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2): 301-310. doi: 10.13700/j.bh.1001-5965.2020.0404(in Chinese)

基于情感对象识别和情感规则的微博倾向性分析

doi: 10.13700/j.bh.1001-5965.2020.0404
基金项目: 

国家自然科学基金 61931019

详细信息
    通讯作者:

    王树鹏, E-mail: wangshupeng@iie.ac.cn

  • 中图分类号: P391

Weibo tendency analysis based on sentimental object recognition and sentimental rules

Funds: 

National Natural Science Foundation of China 61931019

More Information
  • 摘要:

    微博平台数据中含有大量反映用户情感喜恶的信息,对于涉及博文倾向性分析的应用尤为重要。现有的分析方法往往聚焦在博文情感的简单分类上,无法分析特定类型实体的微博倾向性。为解决微博倾向性分析问题,实现博文立场判定,采用半监督学习的方法,通过协同训练和主动学习,训练实体识别模型,并构建基于主成分分析的情感规则,提取句子的主成分,将口语化的文本规范化为指定格式。再利用指向性实体的正负面性、情感词的褒贬义及情感词充当的句子成分,实现情感分类的更深层次分析——立场判定。针对实际问题进行立场判定实验,在不同规模数据集上的自对比实验和他比实验显示,随着标注实体的博文数量增加,模型对博文立场判断的正确率持续提升,而且所提方法判断博文立场的正确率显著高于对比方法,相较已有研究方法分别提高了2.79%和10.00%。

     

  • 图 1  OASOSR算法总体架构

    Figure 1.  Algorithm architecture of OASOSR

    图 2  情感对象实体集提取流程

    Figure 2.  Extraction flowchart of sentimental object entity sets

    图 3  OASOSR算法流程

    Figure 3.  OASOSR algorithm flowchart

    图 4  不同规模数据集上OASOSR算法立场判断正确率

    Figure 4.  Accuracy of standpoint judgement by OASOSR algorithm on different datasets

    图 5  基于不同模型筛选条件的微博立场判断正确率

    Figure 5.  Accuracy of Weibo standpoint judgement based on filtering conditions of different models

    表  1  微博情感分析方法的特点

    Table  1.   Features of Weibo sentiment analysis methods

    情感分析方法 特点
    语义词典 包含多个词典和句法规则库;利用知识库进行聚合计算;需要构建微博情感分析数据库
    传统机器学习 构建特征向量;找到特征与分类结果间的关联;需要大量手工标注
    深度学习 分词并将词语表示为词向量;深度神经网络提取语义信息;构建情感表征向量;需要较长训练时间
    下载: 导出CSV

    表  2  基于深度学习的情感分析方法特点对比

    Table  2.   Comparison of features of sentiment analysis methods based on deep learning

    情感分析方法 特点
    无监督学习 发掘文本数据中内在的词汇情感规律;无需人工标注
    有监督学习 运行效率较高;出现分类错误时后续训练会受到影响
    半监督学习 仅需要标记少量数据,适用于数据量大的任务
    下载: 导出CSV

    表  3  对比实验中选取的数据集

    Table  3.   Datasets selected for comparative experiment

    数据类型 数据集 大小/条 数据格式
    爬取数据 #新冠肺炎疫情#话题 38 175 标签(正向、负向),文本
    公开数据 weibo_senti_100k开源微博情感 119 989 标签(正向、负向),文本
    下载: 导出CSV

    表  4  对比实验中选取的数据集大小

    Table  4.   Size of datasets selected for comparative experiment

    数据集 实验数据条数
    #新冠肺炎疫情#话题 485
    weibo_senti_100k开源微博情感 4 000
    下载: 导出CSV

    表  5  基于不同模型的微博立场判断正确率

    Table  5.   Accuracy of Weibo standpoint judgement based on different models

    方法 正确率/%
    SCSVM 78.56
    SAMPL 71.35
    OASOSR 81.35
    下载: 导出CSV
  • [1] GIACHANOU A, MELE I, CRESTANI F. Explaining sentiment spikes in twitter[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. New York: ACM, 2016: 2263-2268.
    [2] 王志涛, 於志文, 郭斌, 等. 基于词典和规则集的中文微博情感分析[J]. 计算机工程与应用, 2015, 51(8): 218-225. doi: 10.3778/j.issn.1002-8331.1308-0187

    WANG Z T, YU Z W, GUO B, et al. Sentiment analysis of Chinese micro blog based on lexicon and rule set[J]. Computer Engineering and Applications, 2015, 51(8): 218-225(in Chinese). doi: 10.3778/j.issn.1002-8331.1308-0187
    [3] 王灿伟. 基于主题提取的海量微博情感分析[J]. 南京大学学报(自然科学), 2017, 53(3): 549-556. https://www.cnki.com.cn/Article/CJFDTOTAL-NJDZ201703019.htm

    WANG C W. Sentimental analysis of massive micro-blog based on topic extraction[J]. Journal of Nanjing University (Natural Sciences), 2017, 53(3): 549-556(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-NJDZ201703019.htm
    [4] EBRAHIMI J, DOU D J, LOWD D. A joint sentiment-target-stance model for stance classification in tweets[C]//Proceedings of the 26th International Conference on Computational Linguistics, 2016: 2656-2665.
    [5] PAK A, PAROUBEK P. Twitter as a corpus for sentiment analysis and opinion mining[C]//Proceedings of International Conference on Language Resource and Evaluation, 2010: 13-20.
    [6] PANG B, LEE L, VAITHYANATHAN S, et al. Thumbs up : Sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 Conference on Empirical Methods on Natural Language Processing. New York: ACM, 2002: 79-86.
    [7] 奠雨洁, 金琴, 吴慧敏. 基于多文本特征融合的中文微博的立场检测[J]. 计算机工程与应用, 2017, 53(21): 77-84. doi: 10.3778/j.issn.1002-8331.1702-0292

    DIAN Y J, JIN Q, WU H M. Stance detection in Chinese microblogs via fusing multiple text features[J]. Computer Engineering and Applications, 2017, 53(21): 77-84(in Chinese). doi: 10.3778/j.issn.1002-8331.1702-0292
    [8] 李俭兵, 刘栗材. 基于改进型神经网络的影评文本情感分析算法[J]. 计算机工程与科学, 2019, 41(12): 2261-2269. doi: 10.3969/j.issn.1007-130X.2019.12.023

    LI J B, LIU S C. A film criticism sentiment analysis algorithm based on improved neural network[J]. Computer Engineering and Science, 2019, 41(12): 2261-2269(in Chinese). doi: 10.3969/j.issn.1007-130X.2019.12.023
    [9] LI D, QIAN J. Text sentiment analysis based on long and short term memory[C]//2016 First IEEE International Conference on Computer Communication and the Internet (ICCCI). Piscataway: IEEE Press, 2016: 471-475.
    [10] 张仰森, 郑佳, 黄改娟, 等. 基于双重注意力模型的微博情感分析方法[J]. 清华大学学报(自然科学版), 2018, 58(2): 122-130. https://www.cnki.com.cn/Article/CJFDTOTAL-QHXB201802002.htm

    ZHANG Y S, ZHENG J, HUANG G J, et al. Microblog sentiment analysis method based on a double attention model[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(2): 122-130(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-QHXB201802002.htm
    [11] 朱晓光, 聂培尧, 林培光. 基于监督学习的微博情感分类方法[J]. 计算机应用与软件, 2015, 32(8): 238-242. doi: 10.3969/j.issn.1000-386x.2015.08.057

    ZHU X G, NIE P Y, LIN P G. Supervised learning based on microblogging sentiment classification method[J]. Computer Applications and Software, 2015, 32(8): 238-242(in Chinese). doi: 10.3969/j.issn.1000-386x.2015.08.057
    [12] 段吉东, 刘双荣, 马坤, 等. 基于集成学习的文本情感分类方法[J]. 济南大学学报(自然科学版), 2019, 33(6): 483-488. https://www.cnki.com.cn/Article/CJFDTOTAL-SDJC201906001.htm

    DUAN J D, LIU S R, MA K, et al. Text sentiment classification method based on ensemble learning[J]. Journal of University of Jinan(Science and Technology), 2019, 33(6): 483-488(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-SDJC201906001.htm
    [13] TURNEY P D. Thumbs up or thumbs down : Semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002: 417-424.
    [14] BLOOM K, ARGAMON S. Automated learning of appraisal extraction patterns[J]. Language and Computers, 2010, 71(2): 249-260.
    [15] GUO J L, PENG J E, WANG H C. An opinion feature extraction approach based on a multidimensional sentence analysis model[J]. Cybernetics and Systems, 2013, 44(5): 379-401. doi: 10.1080/01969722.2013.789649
    [16] AGRAWAL A, XIE B, VOVSHA I, et al. Sentiment analysis of Twitter data[J]. International Journal of Computer Applications, 2013, 139(11): 880-887
    [17] CAMBRIA E, PORIA S, HAZARIKA D, et al. Senticnet5: Discovering conceptual primitives for sentiment analysis by means of context embeddings[C]//32nd AAAI Conference on Artificial Intelligence, 2018: 1795-1802.
    [18] DANDAPAT S. Handbook of natural language processing(second edition)[J]. Machine Translation, 2011, 25(4): 377-381. doi: 10.1007/s10590-011-9117-6
    [19] SINDHWANI V, MELVILLE P. Document-word co-regularization for semi-supervised sentiment analysis[C]//18th IEEE International Conference on Data Mining. Piscataway: IEEE Press, 2008: 1025-1030.
    [20] LIU Z, DONG X, GUAN Y, et al. Reserved self-training: A semi-supervised sentiment classification method for Chinese micro-blogs[C]//Proceedings of LJCNLP, 2013: 455-462.
    [21] SCUDDER H. Probability of error of some adaptive pattern-recognition machines[J]. IEEE Transactions on Information Theory, 1965, 11(3): 363-371. doi: 10.1109/TIT.1965.1053799
    [22] 陈培文, 傅秀芬. 采用SVM方法的文本情感极性分类研究[J]. 广东工业大学学报, 2014, 31(3): 95-101. doi: 10.3969/j.issn.1007-7162.2014.03.017

    CHEN P W, FU X F. Research on sentiment classification of texts based on SVM[J]. Journal of Guangdong University of Technology, 2014, 31(3): 95-101(in Chinese). doi: 10.3969/j.issn.1007-7162.2014.03.017
    [23] 张成功, 刘培玉, 朱振方, 等. 一种基于极性词典的情感分析方法[J]. 山东大学学报, 2012, 47(3): 47-50. https://www.cnki.com.cn/Article/CJFDTOTAL-SDDX201203011.htm

    ZHANG C G, LIU P Y, ZHU Z F, et al. A sentiment analysis method based on a polarity lexicon[J]. Journal of Shandong University, 2012, 47(3): 47-50(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-SDDX201203011.htm
  • 加载中
图(5) / 表(5)
计量
  • 文章访问数:  253
  • HTML全文浏览量:  34
  • PDF下载量:  90
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-08-09
  • 录用日期:  2020-09-25
  • 网络出版日期:  2022-02-20

目录

    /

    返回文章
    返回
    常见问答