北京航空航天大学学报 ›› 2009, Vol. 35 ›› Issue (10): 1283-1286.

• 论文 • 上一篇    

Wikipedia中的语义析取

余 旸1, 林漳希2, 夏国平1   

  1. 1. 北京航空航天大学 经济管理学院, 北京 100191;
    2. 德克萨斯理工大学 管理学院, 德克萨斯 79410;
    3.
  • 收稿日期:2008-11-30 出版日期:2009-10-31 发布日期:2010-09-16
  • 作者简介:余 旸(1981-),男,浙江杭州人,博士生,bjyuyang@gmail.com.
  • 基金资助:

    国家自然科学基金资助项目(70671007)

Extracting thematic communities from Wikipedia

Yu Yang1, Lin Zhangxi2, Xia Guoping1   

  1. 1. School of Economics and Management, Beijing University of Aeronautics and Astronautics, Beijing 100191, China;
    2. The Rawls College of Business Administration, Texas Tech University, Texas 79410, U.S.A
  • Received:2008-11-30 Online:2009-10-31 Published:2010-09-16

摘要: 维基百科(Wikipedia)现有搜索模块采用关键词匹配方式导致搜索效率相对低下.为了提高Wikipedia中的知识获取效率,提出基于链接分析的词间距算法(TDL,Term Distance based on Linkage).利用可扩展的计算模型,通过内部链接结构分析发现词簇,并且引入排序和推荐机制.基于Wikipedia 2009年5月快照数据的实验表明,TDL有效增强了Wikipedia知识检索的准确性,经由用户评判检验证实TDL算法能有效提高用户意图识别度达7%.

Abstract: The current search module in Wikipedia has low search efficiency due to the search method, which is built on simple keywords matching. To improve the efficiency of knowledge retrieval from the Wikipedia spheres with more accurate links among them, the algorithm named term distance based on linkage (TDL) was proposed. TDL defines a new measure of distance between two keywords, which reorients and organizes those keywords into clusters. It is based on link structure analysis underpinned by computational models. The mechanism of ranking and recommending was imported. The experiment, which based on the snapshot of Wikipedia (May 2009), indicates that TDL would significantly increase the accuracy of knowledge retrieval in Wikipedia and this new algorithm can improve the users- satisfaction by 7% compared with the present one.

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发