�������պ����ѧѧ�� 2004, Vol. 30 Issue (09) :835-838
��ΰ, ������*
�Ͼ����պ����ѧ �����Ӧ���о���, �Ͼ� 210016
Study on an XML approximately duplicated data cleaning method
Chen Wei, Ding Qiulin*
Computer Application Institute, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

Abstract�� Aiming at the importance of semi-structured data XML in data cleaning, how to clean XML approximately duplicated data was studied. An efficient XML approximately duplicated data cleaning method was proposed. This method is adaptive, because any other approximately detecting algorithm can be used in it. An efficient approximately detecting algorithm based on tree edit distance was presented. This algorithm can detect approximately duplicated data efficiently. The lower and upper bounds of tree edit distance were used to optimize the approximately duplicated data detecting algorithm. The improved algorithm can avoid computing the tree edit distance that is not needed between a pair of XML data, and reduce the approximate computation complexity. So, foundations are built for researching XML approximately duplicated data cleaning.
Keywords�� rules library   algorithms library   data cleaning   extensible markup language(XML)   approximately duplicated data     
Received 2003-06-02;
Chen Wei, Ding Qiulin.Study on an XML approximately duplicated data cleaning method[J]  JOURNAL OF BEIJING UNIVERSITY OF AERONAUTICS AND A, 2004,V30(09): 835-838
http://bhxb.buaa.edu.cn//CN/Y2004/V30/I09/835
