Extracting thematic communities from Wikipedia
-
摘要: 维基百科(Wikipedia)现有搜索模块采用关键词匹配方式导致搜索效率相对低下.为了提高Wikipedia中的知识获取效率,提出基于链接分析的词间距算法(TDL,Term Distance based on Linkage).利用可扩展的计算模型,通过内部链接结构分析发现词簇,并且引入排序和推荐机制.基于Wikipedia 2009年5月快照数据的实验表明,TDL有效增强了Wikipedia知识检索的准确性,经由用户评判检验证实TDL算法能有效提高用户意图识别度达7%.Abstract: The current search module in Wikipedia has low search efficiency due to the search method, which is built on simple keywords matching. To improve the efficiency of knowledge retrieval from the Wikipedia spheres with more accurate links among them, the algorithm named term distance based on linkage (TDL) was proposed. TDL defines a new measure of distance between two keywords, which reorients and organizes those keywords into clusters. It is based on link structure analysis underpinned by computational models. The mechanism of ranking and recommending was imported. The experiment, which based on the snapshot of Wikipedia (May 2009), indicates that TDL would significantly increase the accuracy of knowledge retrieval in Wikipedia and this new algorithm can improve the users- satisfaction by 7% compared with the present one.
-
Key words:
- Wikipedia /
- link analysis /
- knowledge discovery in databases
-
[1] Markus K, Denny V, Max V. Wikipedia and the semantic web-the missing links Walter V. Wikimania 2005. Frankfurt am Main, Germany: Association for Computing Machinery Press(ACM),2005:117-125 [2] Max V, Markus K, Denny V, et al. Semantic Wikipedia Leslie C. WWW2006. Edinburgh, Scotland: Association for Computing Machinery Press(ACM),2005:265-274 [3] Shawn D A. Structure helps a Wiki navigate Mohammad A. WebDB 2005. Arlington,VA: AAAI Press,2005:97-108 [4] Natalia K. Automatic ontology extraction for document classification . Saarbrücken: Computer Science Department,Saarland University, 2006 [5] Daniel K. WikiSense-mining the Wiki Walter V. Wikimania 2005. Frankfurt am Main, Germany: Association for Computing Machinery Press (ACM),2005:254-276 [6] Chakrabarti S. Data mining for hypertext: A tutorial survey Usama M F. SIGKDD Explorations.Cambridge,Massachusetts:MIT Press,2000:113-125 [7] Jakob V. Measuring Wikipedia Peter I. ISSI 2005. Stockholm,Sweden:Karolinska University Press,2005:21-36 [8] Francesco B, Roberto B. Network analisis for Wikipedia Walter V. Wikimania 2005. Frankfurt am Main, Germany: Association for Computing Machinery Press (ACM),2005:334-367 [9] Sergey B, Lawrence P. The anatomy of a large-scale hypertextual web search engine[J]. Computer Networks and ISDN Systems,1998,30(1-7):107-117 [10] Jon K. Authoritative sources in a hyperlinked environment . Technical Report RJ 10076, IBM, 1997 [11] Fernanda B, Martin W, Kushal D. Studying cooperation and conflict between authors with history flow visualizations Brian B. SIGCHI 2004. Vienna:Association for Computing Machinery Press (ACM),2004:575-582 [12] Salton G. Automatic text processing: the transformation, analysis, and retrieval of information by computer[M]. New York:Addison-Wesley,1989:11-17 [13] Broder, Henzinger M. Information retrieval on the web: Tools and algorithmic issues [M]. Austin:Addison-Wesley,1998:112-145
点击查看大图
计量
- 文章访问数: 2836
- HTML全文浏览量: 94
- PDF下载量: 1763
- 被引次数: 0