Classification of network public opinion propagation pattern based on variational reasoning
-
摘要:
随着网络社交媒体的快速发展,对舆情信息的传播模式进行分析成为研究热点。针对网络舆情传播模式分类任务中,小样本数据多路径生成分类正确率低的问题,提出了舆情传播领域知识图谱结构定义,建立了基于微博数据的舆情传播知识图谱与舆情传播分析任务数据集,使用GraphDIVA模型进行舆情传播模式分类,并在自建数据集中进行了舆情传播模式分类25样本测试实验。结果表明:模型在经过20轮训练后,分类正确率从76%提升到89.4%,说明GraphDIVA模型在减少训练次数、提升分类正确率方面具有更优的效果。
Abstract:With the rapid development of online social media, the analysis of the dissemination mode of public opinion information has become a research hotspot.Aiming at the problem of low classification accuracy of small sample data multi-path generation in the classification task of the network public opinion spreading pattern, the definition of the knowledge graph structure in the field of public opinion dissemination is proposed, builds a public opinion dissemination knowledge graph and public opinion dissemination analysis task data set based on Weibo data, uses the GraphDIVA model to classify public opinion propagation patterns, and conducts a 25-sample test experiment of public opinion propagation pattern classification in the self-built data set. The results show that, after 20 rounds of training, the classification accuracy rate of the model has increased from 76% to 89.4%. It can be seen that the GraphDIVA model has a better effect in reducing the number of training and improving the classification accuracy rate.
-
表 1 数据数量统计
Table 1. Data quantity statistics
类型 数量/条 博文 712 577 评论 388 689 用户 709 603 话题标签 32 549 关键词 74 表 2 博文转发量统计
Table 2. Blog post forwarding quantity statistics
类型 数量/条 转发量为0 651 502 转发量大于0小于1 000 56 547 转发量大于或等于1 000,小于10 000 3 275 转发量大于或等于10 000,小于50 000 1 006 转发量大于或等于50 000,小于200 000 201 转发量大于或等于200 000 46 最大转发量 1 304 609 表 3 博文传播地位统计
Table 3. Statistics of propagation status of blog posts
类型 数量/条 始发微博 98 173 转发微博 614 404 表 4 博文话题标签统计
Table 4. Blog post hashtag statistics
类型 数量/条 含话题标签的微博数目 96 209 平均话题标签数量 1.66 最大话题标签数量 24 表 5 评论点赞量统计
Table 5. Statistics of the amount of likes
类型 数量/条 点赞量为0 197 794 点赞量大于0小于1 000 183 092 点赞量大于或等于1 000,小于5 000 6 108 点赞量大于或等于5 000,小于20 000 1 438 点赞量大于或等于20 000,小于100 000 252 点赞量大于或等于100 000 5 最大点赞量 416 050 表 6 话题标签数据统计
Table 6. Hashtag data statistics
类型 数量/条 出现次数为1 23 142 出现次数大于1,小于20 8 424 出现次数大于或等于20,小于50 580 出现次数大于或等于50,小于200 301 出现次数大于或等于200,小于1 000 66 出现次数大于或等于1 000 11 最大出现次数 5 863 表 7 关键词出现次数统计
Table 7. Statistics of keyword occurrence times
类型 数量/条 出现次数大于1,小于100 4 出现次数大于或等于100,小于400 23 出现次数大于或等于400,小于1 000 23 出现次数大于或等于1 000,小于4 000 18 出现次数大于或等于4 000 6 最大出现次数 17 845 表 8 舆情传播深度统计
Table 8. Statistics of public opinion propagation depth
类型 数量/条 总转发数量/条 无转发博文数量 66 899 0 转发深度为1 27 267 343 720 转发深度为2 2 473 150 780 转发深度为3 540 74 564 转发深度为4 141 28 389 转发深度为5 71 21 148 转发深度为6 33 11 698 转发深度为7 17 6 040 转发深度为8 4 925 转发深度为9 8 6 127 转发深度为10 3 2 113 转发深度为21 1 154 表 9 样本分布状况
Table 9. Sample distribution status
类型 新闻类/个 娱乐类/个 训练 4 534 9 441 测试 1 538 3 120 表 10 Fast-TransE参数
Table 10. Fast-TransE parameters
参数 数值 embedding_size 100 nbatches 1 threads 8 epochs 1 000 alpha 0.001 表 11 传播模式分析网络参数
Table 11. Propagation pattern analysis network parameters
参数 数值 finder_lstm_width 200 finder_mlp_width 200 reasoner_cnn_filter_size 64 reasoner_aggregate_width 200 reasoner_aggregate_neighbors 25 reasoner_lstm_width 200 reasoner_mlp_width 200 表 12 传播模式分析训练参数
Table 12. Propagation pattern analysis training parameters
参数 数值 guided _learn_rate 0.001 guided _max_epoch 25 guided _stop_growth 0.007 5 guided_max_path_width 5 unified_max_epoch 20 unified_posterior_learn_rate 0.01 unified_likelihood_learn_rate 0.000 25 unified_prior_learn_rate 0.001 unified_max_path_width 10 -
[1] 童亚拉. 突发群体性事件网络舆情信息传播复杂网络预测模型分析[J]. 微型电脑应用, 2011, 27(2): 28-29. doi: 10.3969/j.issn.1007-757X.2011.02.010TONG Y L. Analysis of forecasting module of information communication in mass emergency using theory of comple[J]. Microcomputer Applications, 2011, 27(2): 28-29(in Chinese). doi: 10.3969/j.issn.1007-757X.2011.02.010 [2] 徐增林, 盛泳潘, 贺丽荣, 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016(4): 589-606. doi: 10.3969/j.issn.1001-0548.2016.04.012XU Z L, SHENG Y P, HE L R, et al. Review on knowledge graph techniques[J]. Journal of University of Electronic Science and Technology of China, 2016(4): 589-606(in Chinese). doi: 10.3969/j.issn.1001-0548.2016.04.012 [3] 王晰巍, 邢云菲, 赵丹, 等. 基于社会网络分析的移动环境下网络舆情信息传播研究——以新浪微博"雾霾"话题为例[J]. 图书情报工作, 2015, 59(7): 14-22. https://www.cnki.com.cn/Article/CJFDTOTAL-TSQB201507005.htmWANG X W, XING Y F, ZHAO D, et al. The study of network public opinion dissemination with social network analysis under the mobile environment: A case of "Haze" in Sina Micro-blog[J]. Library and Information Service, 2015, 59(7): 14-22(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-TSQB201507005.htm [4] 崔树娟, 宾晟, 孙更新, 等. 基于大数据分析的多关系社交网络舆情传播模型研究[J]. 中南民族大学学报(自然科学版), 2018, 37(2): 118-124. https://www.cnki.com.cn/Article/CJFDTOTAL-ZNZK201802025.htmCUI S J, BIN S, SUN G X, et al. Public opinion propagation model based on big data analytics in multiple relationships social network[J]. Journal of South-Central University for Nationalities(Natural Science Edition), 2018, 37(2): 118-124(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-ZNZK201802025.htm [5] 王兰成, 娄国哲. 大数据环境下涉军网络舆情的知识图谱服务研究[J]. 中华医学图书情报杂志, 2018, 27(4): 4-9. https://www.cnki.com.cn/Article/CJFDTOTAL-YXTS201804001.htmWANG L C, LOU G Z. Knowledge graph service for military network opinion in the big data era[J]. Chinese Journal of Medical Library and Information Science, 2018, 27(4): 4-9(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-YXTS201804001.htm [6] 马哲坤, 涂艳. 基于知识图谱的网络舆情突发话题内容监测研究[J]. 情报科学, 2019, 37(2): 33-39. https://www.cnki.com.cn/Article/CJFDTOTAL-QBKX201902006.htmMA Z K, TU Y. Online emerging topic content monitoring based on knowledge graph[J]. Information Science, 2019, 37(2): 33-39(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-QBKX201902006.htm [7] CHEN W, XIONG W, YAN X, et al. Variational knowledge graph reasoning[EB/OL]. (2018-10-23)[2020-09-01]. https://arxiv.org/abs/1803.06581. [8] KINGMA D P, WELLING M. Auto-encoding variational Bayes[EB/OL]. (2014-05-01)[2020-09-01]. https://arxiv.org/abs/1312.6114v10. [9] HAMILTON W, YING Z, LESKOVEC J. Inductive representation learning on large graphs[C]//Advances in Neural Information Processing Systems, 2017: 1024-1034. [10] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22)[2020-09-01]. https://arxiv.org/abs/1609.02907. [11] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. doi: 10.1162/neco.1997.9.8.1735 [12] 娄国哲, 王兰成. 基于知识图谱的网络舆情知识组织方法研究[J]. 情报理论与实践, 2019, 42(1): 58-64. https://www.cnki.com.cn/Article/CJFDTOTAL-QBLL201901010.htmLOU G Z, WANG L C. Network public opinion knowledge organizing method based on knowledge map[J]. Information Studies: Theory & Application, 2019, 42(1): 58-64(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-QBLL201901010.htm [13] 刘继, 李磊. 基于微博用户转发行为的舆情信息传播模式分析[J]. 情报杂志, 2013, 32(7): 78-81. doi: 10.3969/j.issn.1002-1965.2013.07.016LIU J, LI L. Analysis of public opinion propagation mode based on repost behavior of microblog users[J]. Journal of Intelligence, 2013, 32(7): 78-81(in Chinese). doi: 10.3969/j.issn.1002-1965.2013.07.016 [14] PRADIP K S, SHAILENDRA R, JONG H P. Multilevel learning based modeling for link prediction and users consumption preference in online social networks[J]. Future Generation Computer Systems, 2019, 93: 952-961. doi: 10.1016/j.future.2017.08.031 [15] XIONG W, HOANG T, WANG W Y. DeepPath: A reinforcement learning method for knowledge graph reasoning[EB/OL]. (2018-07-07)[2020-09-01]. https://arxiv.org/abs/1707.06690v3. [16] LIN Y, LIU Z, SUN M, et al. Learning entity and relation embeddings for knowledge graph completion[C]//Twenty-ninth AAAI Conference on Artificial Intelligence, 2015: 2181-2182.