-
摘要:
近年来国内外地震灾害给人类的生命财产造成了巨大损失。海量的互联网地震灾情信息可为应急响应和及时救援提供决策支持的依据,因此互联网灾情信息高效快速处理需求迫切。针对地震互联网灾情信息处理的研究,定义了地震事件模型、网页对象模型等,用极限方法定义Web信息收敛性,刻画了互联网灾情信息传播特点。根据灾情信息时效性特点,提出了一种支持动态收敛性的Web信息抽取算法,完成互联网灾情信息提取。提出一种针对灾害信息随时间的变化进行时序统计的方法,形成信息统计报告,为制定救援决策提供依据。设计并实现了面向地震应急响应的互联网信息智能处理系统,并进行了工程实践验证。
Abstract:In recent years, the domestic and international earthquake disaster has caused huge losses to human life and property. Massive earthquake disaster information on the Internet can provide the basis for decision support for emergency response and timely rescue, so there is an urgent need for efficient and rapid processing of the disaster information. To study the information processing of the earthquake disaster in the Internet, the seismic event model, webpage object model, etc. were defined, and the convergence of Web information was defined with the limit method. The characteristics of the Internet disaster information dissemination were described. According to the characteristics of the timeliness of disaster information, the Web information extraction algorithm was proposed, which supports dynamic convergence. The method of time series statistics for the change of the disaster information with time was presented, and the information statistic report was formed, which provides the basis for rescue decision making. The intelligent information processing system for earthquake emergency response was designed and implemented. The models and methods were verified in a practical engineering project.
-
表 1 3种算法查全率、查准率对比
Table 1. Comparison of recall ratio and precision ratio among three algorithms
% 冗余度 查全率 查准率 SNM MPN 本文 SNM MPN 本文 100 80 90 94 83 88 93 200 79 91 96 88 89 95 300 76 90 95 90 90 95 -
[1] 中国地震局. 历史地震目录[EB/OL]. 北京: 中国地震局, 2016(2016-06-14)[2016-09-10]. http://www.cea.gov.cn/publish/dizhenj/468/496/index.html.China Earthquake Administration.The record of earthquake history[EB/OL].Beijing:China Earthquake Administration, 2016(2016-06-14)[2016-09-10].http://www.cea.gov.cn/publish/dizhenj/468/496/index.html(in Chinese). [2] 新华网. 四川汶川地震抗震救灾进展情况[EB/OL]. 北京: 新华网, 2008(2008-06-22)[2016-09-10]. http://news.xinhuanet.com/newscenter/2008-06/22/content_8417853.htm.Xinhuanet.Report on the earthquake rescue in Wenchan, Sichuan[EB/OL].Beijing:Xinhuanet, 2008(2008-06-22)[2016-09-10].http://news.xinhuanet.com/newscenter/2008-06/22/content_8417853.htm(in Chinese). [3] 赵亚辉. 汶川地震直接经济损失8 451亿元[EB/OL]. 北京: 人民网, 2008(2008-09-05)[2016-09-10]. http://society.people.com.cn/GB/41158/7805669.html.ZHAO Y H.Direct economic losses of 845 billion 100 million yuan in Wenchuan earthquake[EB/OL].Beijing:People, 2008(2008-09-05)[2016-09-10].http://society.people.com.cn/GB/41158/7805669.html(in Chinese). [4] 霍娜. 突发事件追踪报道信息提取的研究[D]. 太原: 山西大学, 2012: 19-25.HUO N.Research of sudden event information extraction of tracking reports[D].Taiyuan:Shanxi University, 2012:19-25(in Chinese). [5] HE J, GU Y Q, LIU H Y, et al.Scalable and noise tolerant web knowledge extraction for search task simplification[J].Decision Support Systems, 2013, 56(5):156-167. https://www.researchgate.net/publication/259510690_Scalable_and_noise_tolerant_web_knowledge_extraction_for_search_task_simplification [6] SLEIMAN H A, CORCHUELO R.A class of neural-network-based transducers for web information extraction[J].Neuro computing, 2013, 135(5):61-68. [7] 侯明燕. 基于网页信息定位的数据抽取技术的研究[D]. 广州: 暨南大学, 2011: 32-37.HOU M Y.Data extraction technology research based on the location of Web information[D].Guangzhou:Jinan University, 2011:32-37(in Chinese). [8] AO J, ZHANG P, CAO Y N.Estimating the locations of emergency events from Twitter streams[J].Procedia Computer Science, 2014, 31:731-739. doi: 10.1016/j.procs.2014.05.321 [9] AEBI D, PERROCHON L.Towards improving data quality[C]//Proceedings of the International Conference on Information Systems and Management of Data.Delhi:Sarda, 1999:273-281. [10] INMON W H. DW2. 0: 下一代数据仓库的架构[M]. 王志海, 王建林, 译. 北京: 机械工业出版社, 2010: 174-180.INMON W H.DW2.0:The architecture for the next generation of data warehousing[M].WANG Z H, WANG J L, translated.Beijing:China Machine Press, 2010:174-180(in Chinese). [11] QUMSIYEH R, NG Y K.Enhancing web search by using query-based clusters and multi-document summaries[J].Knowledge and Information Systems, 2016, 47(2):355-380. doi: 10.1007/s10115-015-0852-5 [12] VALIZADEH M, BRAZDIL P.Exploring actor-object relationships for query-focused multi-document summarization[J].Soft Computing, 2015, 19(11):3109-3121. doi: 10.1007/s00500-014-1471-x [13] ALGULIYEV R M, ALIGULIYEV R M, ISAZADE N R.An unsupervised approach to generating generic summaries of documents[J].Applied Soft Computing, 2015, 34(9):236-250. https://www.researchgate.net/publication/277478372_An_unsupervised_approach_to_generating_generic_summaries_of_documents [14] KHAN A, SALIM N, KUMAR Y J.A framework for multi-document abstractive summarization based on semantic role labeling[J].Applied Soft Computing, 2015, 30(5):737-747. https://www.researchgate.net/publication/282356253_A_framework_for_multi-document_abstractive_summarization_based_on_semantic_role_labeling [15] 王振超, 孙锐, 姬东鸿.基于事件指导的多文档生成式摘要方法[J].计算机应用研究, 2016, 34(2):343-346. http://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ201702006.htmWANG Z C, SUN R, JI D H.Event-guided method for abstractive multi-document summarization[J].Application Research of Computers, 2016, 34(2):343-346(in Chinese). http://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ201702006.htm [16] DEY D, SARKAR S, DE P.A distance-based approach to entity reconciliation in heterogeneous databases[J].IEEE Transactions on Knowledge and Data Engineering, 2002, 14(3):567-582. doi: 10.1109/TKDE.2002.1000343 [17] 姚清耘, 刘功申, 李翔.基于向量空间模型的文本聚类算法[J].计算机工程, 2008, 34(18):39-44. doi: 10.3969/j.issn.1000-3428.2008.18.014YAO Q Y, LIU G S, LI X.VSM-based text clustering algorithm[J].Computer Engineering, 2008, 34(18):39-44(in Chinese). doi: 10.3969/j.issn.1000-3428.2008.18.014 [18] HERNÁNDEZ M A, STOLFO S J.Real-world data is dirty:Data cleansing and the merge/purge problem[J].Data Mining and Knowledge Discovery, 1998, 2(1):9-37. doi: 10.1023/A:1009761603038 [19] HERNÁNDEZ M A, STOLFO S J.The merge/purge problem for large databases[C]//Proceedings of International Conference on Management of Data.New York:ACM SIGMOD, 1995:127-138.