基于变分推理的网络舆情传播模式分类

唐红梅; 唐文忠; 李瑞晨; 王衍洋; 王丽宏

doi:10.13700/j.bh.1001-5965.2020.0538

基于变分推理的网络舆情传播模式分类

doi: 10.13700/j.bh.1001-5965.2020.0538

唐红梅^{1, 2},
唐文忠¹,
李瑞晨¹,
王衍洋^{3, 4, ,},
王丽宏⁵

1.
北京航空航天大学计算机学院, 北京 100083
2.
新疆维吾尔自治区科技项目服务中心, 乌鲁木齐 830000
3.
北京航空航天大学航空科学与工程学院, 北京 100083
4.
北航江西研究院, 南昌 330096
5.
国家计算机网络应急技术处理协调中心, 北京 100029

基金项目:

新疆维吾尔自治区自然科学基金 2020D01A95

详细信息

通讯作者:
王衍洋, E-mail: wangyanyang@buaa.edu.cn

中图分类号: TP391
计量
- 文章访问数: 498
- HTML全文浏览量: 111
- PDF下载量: 59
- 被引次数: 0
出版历程
- 收稿日期: 2020-09-22
- 录用日期: 2020-10-23
- 网络出版日期: 2022-02-20

Classification of network public opinion propagation pattern based on variational reasoning

1.
School of Computer Science and Engineering, Beihang University, Beijing 100083, China
2.
Science and Technology Projects Service Center of Xinjiang Uygur Autonomous Region, Urumqi 830000, China
3.
School of Aeronautic Science and Engineering, Beihang University, Beijing 100083, China
4.
Beihang Jiangxi Research Institute, Nanchang 330096, China
5.
National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China

Funds:

Natural Science Foundation of Xinjiang Uygur Autonomous Region 2020D01A95

More Information

Corresponding author: WANG Yanyang, E-mail: wangyanyang@buaa.edu.cn

摘要

摘要:
随着网络社交媒体的快速发展，对舆情信息的传播模式进行分析成为研究热点。针对网络舆情传播模式分类任务中，小样本数据多路径生成分类正确率低的问题，提出了舆情传播领域知识图谱结构定义，建立了基于微博数据的舆情传播知识图谱与舆情传播分析任务数据集，使用GraphDIVA模型进行舆情传播模式分类，并在自建数据集中进行了舆情传播模式分类25样本测试实验。结果表明：模型在经过20轮训练后，分类正确率从76%提升到89.4%，说明GraphDIVA模型在减少训练次数、提升分类正确率方面具有更优的效果。
- 舆情传播模式 /
- 知识图谱 /
- 知识图谱推理 /
- 图神经网络 /
- 模式分析
Abstract:
With the rapid development of online social media, the analysis of the dissemination mode of public opinion information has become a research hotspot.Aiming at the problem of low classification accuracy of small sample data multi-path generation in the classification task of the network public opinion spreading pattern, the definition of the knowledge graph structure in the field of public opinion dissemination is proposed, builds a public opinion dissemination knowledge graph and public opinion dissemination analysis task data set based on Weibo data, uses the GraphDIVA model to classify public opinion propagation patterns, and conducts a 25-sample test experiment of public opinion propagation pattern classification in the self-built data set. The results show that, after 20 rounds of training, the classification accuracy rate of the model has increased from 76% to 89.4%. It can be seen that the GraphDIVA model has a better effect in reducing the number of training and improving the classification accuracy rate.
- public opinion propagation pattern /
- knowledge graph /
- knowledge graph reasoning /
- graph neural network /
- pattern analysis

HTML全文

图 1 GraphDIVA路径特征生成过程

Figure 1. GraphDIVA path feature generation process

下载: 全尺寸图片幻灯片

图 2 GraphDIVA路径推理模块结构

Figure 2. GraphDIVA path reasoning module structure

下载: 全尺寸图片幻灯片

图 3 微博舆情信息传播基本模式示例

Figure 3. Example of basic pattern of Weibo public opinion information propagation

下载: 全尺寸图片幻灯片

图 4 舆情传播模式分类100样本测试结果

Figure 4. 100 sample test results of public opinion propagation pattern classification

下载: 全尺寸图片幻灯片

图 5 舆情传播模式分类25样本测试结果

Figure 5. sample test results of public opinion propagation pattern classification

下载: 全尺寸图片幻灯片

图 6 新闻类案例分析结果

Figure 6. News case analysis results

下载: 全尺寸图片幻灯片

图 7 娱乐类案例分析结果

Figure 7. Entertainment case analysis results

下载: 全尺寸图片幻灯片

表 1 数据数量统计

Table 1. Data quantity statistics

类型	数量/条
博文	712 577
评论	388 689
用户	709 603
话题标签	32 549
关键词	74

下载: 导出CSV

表 2 博文转发量统计

Table 2. Blog post forwarding quantity statistics

类型	数量/条
转发量为0	651 502
转发量大于0小于1 000	56 547
转发量大于或等于1 000，小于10 000	3 275
转发量大于或等于10 000，小于50 000	1 006
转发量大于或等于50 000，小于200 000	201
转发量大于或等于200 000	46
最大转发量	1 304 609

下载: 导出CSV

表 3 博文传播地位统计

Table 3. Statistics of propagation status of blog posts

类型	数量/条
始发微博	98 173
转发微博	614 404

下载: 导出CSV

表 4 博文话题标签统计

Table 4. Blog post hashtag statistics

类型	数量/条
含话题标签的微博数目	96 209
平均话题标签数量	1.66
最大话题标签数量	24

下载: 导出CSV

表 5 评论点赞量统计

Table 5. Statistics of the amount of likes

类型	数量/条
点赞量为0	197 794
点赞量大于0小于1 000	183 092
点赞量大于或等于1 000，小于5 000	6 108
点赞量大于或等于5 000，小于20 000	1 438
点赞量大于或等于20 000，小于100 000	252
点赞量大于或等于100 000	5
最大点赞量	416 050

下载: 导出CSV

表 6 话题标签数据统计

Table 6. Hashtag data statistics

类型	数量/条
出现次数为1	23 142
出现次数大于1，小于20	8 424
出现次数大于或等于20，小于50	580
出现次数大于或等于50，小于200	301
出现次数大于或等于200，小于1 000	66
出现次数大于或等于1 000	11
最大出现次数	5 863

下载: 导出CSV

表 7 关键词出现次数统计

Table 7. Statistics of keyword occurrence times

类型	数量/条
出现次数大于1，小于100	4
出现次数大于或等于100，小于400	23
出现次数大于或等于400，小于1 000	23
出现次数大于或等于1 000，小于4 000	18
出现次数大于或等于4 000	6
最大出现次数	17 845

下载: 导出CSV

表 8 舆情传播深度统计

Table 8. Statistics of public opinion propagation depth

类型	数量/条	总转发数量/条
无转发博文数量	66 899	0
转发深度为1	27 267	343 720
转发深度为2	2 473	150 780
转发深度为3	540	74 564
转发深度为4	141	28 389
转发深度为5	71	21 148
转发深度为6	33	11 698
转发深度为7	17	6 040
转发深度为8	4	925
转发深度为9	8	6 127
转发深度为10	3	2 113
转发深度为21	1	154

下载: 导出CSV

表 9 样本分布状况

Table 9. Sample distribution status

类型	新闻类/个	娱乐类/个
训练	4 534	9 441
测试	1 538	3 120

下载: 导出CSV

表 10 Fast-TransE参数

Table 10. Fast-TransE parameters

参数	数值
embedding_size	100
nbatches	1
threads	8
epochs	1 000
alpha	0.001

下载: 导出CSV

表 11 传播模式分析网络参数

Table 11. Propagation pattern analysis network parameters

参数	数值
finder_lstm_width	200
finder_mlp_width	200
reasoner_cnn_filter_size	64
reasoner_aggregate_width	200
reasoner_aggregate_neighbors	25
reasoner_lstm_width	200
reasoner_mlp_width	200

下载: 导出CSV

表 12 传播模式分析训练参数

Table 12. Propagation pattern analysis training parameters

参数	数值
guided _learn_rate	0.001
guided _max_epoch	25
guided _stop_growth	0.007 5
guided_max_path_width	5
unified_max_epoch	20
unified_posterior_learn_rate	0.01
unified_likelihood_learn_rate	0.000 25
unified_prior_learn_rate	0.001
unified_max_path_width	10

下载: 导出CSV

参考文献(16)

[1]	童亚拉. 突发群体性事件网络舆情信息传播复杂网络预测模型分析[J]. 微型电脑应用, 2011, 27(2): 28-29. doi: 10.3969/j.issn.1007-757X.2011.02.010 TONG Y L. Analysis of forecasting module of information communication in mass emergency using theory of comple[J]. Microcomputer Applications, 2011, 27(2): 28-29(in Chinese). doi: 10.3969/j.issn.1007-757X.2011.02.010
[2]	徐增林, 盛泳潘, 贺丽荣, 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016(4): 589-606. doi: 10.3969/j.issn.1001-0548.2016.04.012 XU Z L, SHENG Y P, HE L R, et al. Review on knowledge graph techniques[J]. Journal of University of Electronic Science and Technology of China, 2016(4): 589-606(in Chinese). doi: 10.3969/j.issn.1001-0548.2016.04.012
[3]	王晰巍, 邢云菲, 赵丹, 等. 基于社会网络分析的移动环境下网络舆情信息传播研究——以新浪微博"雾霾"话题为例[J]. 图书情报工作, 2015, 59(7): 14-22. https://www.cnki.com.cn/Article/CJFDTOTAL-TSQB201507005.htm WANG X W, XING Y F, ZHAO D, et al. The study of network public opinion dissemination with social network analysis under the mobile environment: A case of "Haze" in Sina Micro-blog[J]. Library and Information Service, 2015, 59(7): 14-22(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-TSQB201507005.htm
[4]	崔树娟, 宾晟, 孙更新, 等. 基于大数据分析的多关系社交网络舆情传播模型研究[J]. 中南民族大学学报(自然科学版), 2018, 37(2): 118-124. https://www.cnki.com.cn/Article/CJFDTOTAL-ZNZK201802025.htm CUI S J, BIN S, SUN G X, et al. Public opinion propagation model based on big data analytics in multiple relationships social network[J]. Journal of South-Central University for Nationalities(Natural Science Edition), 2018, 37(2): 118-124(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-ZNZK201802025.htm
[5]	王兰成, 娄国哲. 大数据环境下涉军网络舆情的知识图谱服务研究[J]. 中华医学图书情报杂志, 2018, 27(4): 4-9. https://www.cnki.com.cn/Article/CJFDTOTAL-YXTS201804001.htm WANG L C, LOU G Z. Knowledge graph service for military network opinion in the big data era[J]. Chinese Journal of Medical Library and Information Science, 2018, 27(4): 4-9(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-YXTS201804001.htm
[6]	马哲坤, 涂艳. 基于知识图谱的网络舆情突发话题内容监测研究[J]. 情报科学, 2019, 37(2): 33-39. https://www.cnki.com.cn/Article/CJFDTOTAL-QBKX201902006.htm MA Z K, TU Y. Online emerging topic content monitoring based on knowledge graph[J]. Information Science, 2019, 37(2): 33-39(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-QBKX201902006.htm
[7]	CHEN W, XIONG W, YAN X, et al. Variational knowledge graph reasoning[EB/OL]. (2018-10-23)[2020-09-01]. https://arxiv.org/abs/1803.06581.
[8]	KINGMA D P, WELLING M. Auto-encoding variational Bayes[EB/OL]. (2014-05-01)[2020-09-01]. https://arxiv.org/abs/1312.6114v10.
[9]	HAMILTON W, YING Z, LESKOVEC J. Inductive representation learning on large graphs[C]//Advances in Neural Information Processing Systems, 2017: 1024-1034.
[10]	KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22)[2020-09-01]. https://arxiv.org/abs/1609.02907.
[11]	HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. doi: 10.1162/neco.1997.9.8.1735
[12]	娄国哲, 王兰成. 基于知识图谱的网络舆情知识组织方法研究[J]. 情报理论与实践, 2019, 42(1): 58-64. https://www.cnki.com.cn/Article/CJFDTOTAL-QBLL201901010.htm LOU G Z, WANG L C. Network public opinion knowledge organizing method based on knowledge map[J]. Information Studies: Theory & Application, 2019, 42(1): 58-64(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-QBLL201901010.htm
[13]	刘继, 李磊. 基于微博用户转发行为的舆情信息传播模式分析[J]. 情报杂志, 2013, 32(7): 78-81. doi: 10.3969/j.issn.1002-1965.2013.07.016 LIU J, LI L. Analysis of public opinion propagation mode based on repost behavior of microblog users[J]. Journal of Intelligence, 2013, 32(7): 78-81(in Chinese). doi: 10.3969/j.issn.1002-1965.2013.07.016
[14]	PRADIP K S, SHAILENDRA R, JONG H P. Multilevel learning based modeling for link prediction and users consumption preference in online social networks[J]. Future Generation Computer Systems, 2019, 93: 952-961. doi: 10.1016/j.future.2017.08.031
[15]	XIONG W, HOANG T, WANG W Y. DeepPath: A reinforcement learning method for knowledge graph reasoning[EB/OL]. (2018-07-07)[2020-09-01]. https://arxiv.org/abs/1707.06690v3.
[16]	LIN Y, LIU Z, SUN M, et al. Learning entity and relation embeddings for knowledge graph completion[C]//Twenty-ninth AAAI Conference on Artificial Intelligence, 2015: 2181-2182.