�������պ����ѧѧ�� 2009, Vol. 35 Issue (10) :1283-1286
�� �D1, ����ϣ2, �Ĺ�ƽ1*
1. �������պ����ѧ ���ù���ѧԺ, ���� 100191;
2. �¿���˹����ѧ ����ѧԺ, �¿���˹ 79410;
Extracting thematic communities from Wikipedia
Yu Yang1, Lin Zhangxi2, Xia Guoping1*
1. School of Economics and Management, Beijing University of Aeronautics and Astronautics, Beijing 100191, China;
2. The Rawls College of Business Administration, Texas Tech University, Texas 79410, U.S.A

ժҪ ά���ٿ�(Wikipedia)��������ģ����ùؼ���ƥ�䷽ʽ��������Ч����Ե���.Ϊ�����Wikipedia�е�֪ʶ��ȡЧ��,����������ӷ����Ĵʼ���㷨(TDL,Term Distance based on Linkage).���ÿ���չ�ļ���ģ��,ͨ���ڲ����ӽṹ�������ִʴ�,��������������Ƽ�����.����Wikipedia 2009��5�¿������ݵ�ʵ�����,TDL��Ч��ǿ��Wikipedia֪ʶ������׼ȷ��,�����û����м���֤ʵTDL�㷨����Ч����û���ͼʶ��ȴ�7%.
�ؼ����� Wikipedia   ���ӷ���   ֪ʶ����     
Abstract�� The current search module in Wikipedia has low search efficiency due to the search method, which is built on simple keywords matching. To improve the efficiency of knowledge retrieval from the Wikipedia spheres with more accurate links among them, the algorithm named term distance based on linkage (TDL) was proposed. TDL defines a new measure of distance between two keywords, which reorients and organizes those keywords into clusters. It is based on link structure analysis underpinned by computational models. The mechanism of ranking and recommending was imported. The experiment, which based on the snapshot of Wikipedia (May 2009), indicates that TDL would significantly increase the accuracy of knowledge retrieval in Wikipedia and this new algorithm can improve the users- satisfaction by 7% compared with the present one.
Keywords�� Wikipedia   link analysis   knowledge discovery in databases     
Received 2008-11-30;


About author: �� �D(1981-),��,�㽭������,��ʿ��,bjyuyang@gmail.com.
�� �D, ����ϣ, �Ĺ�ƽ.Wikipedia�е�������ȡ[J]  �������պ����ѧѧ��, 2009,V35(10): 1283-1286
Yu Yang, Lin Zhangxi, Xia Guoping.Extracting thematic communities from Wikipedia[J]  JOURNAL OF BEIJING UNIVERSITY OF AERONAUTICS AND A, 2009,V35(10): 1283-1286
