Volume 30 Issue 09
Sep.  2004
Turn off MathJax
Article Contents
Chen Wei, Ding Qiulin. Study on an XML approximately duplicated data cleaning method[J]. Journal of Beijing University of Aeronautics and Astronautics, 2004, 30(09): 835-838. (in Chinese)
Citation: Chen Wei, Ding Qiulin. Study on an XML approximately duplicated data cleaning method[J]. Journal of Beijing University of Aeronautics and Astronautics, 2004, 30(09): 835-838. (in Chinese)

Study on an XML approximately duplicated data cleaning method

  • Received Date: 02 Jun 2003
  • Publish Date: 30 Sep 2004
  • Aiming at the importance of semi-structured data XML in data cleaning, how to clean XML approximately duplicated data was studied. An efficient XML approximately duplicated data cleaning method was proposed. This method is adaptive, because any other approximately detecting algorithm can be used in it. An efficient approximately detecting algorithm based on tree edit distance was presented. This algorithm can detect approximately duplicated data efficiently. The lower and upper bounds of tree edit distance were used to optimize the approximately duplicated data detecting algorithm. The improved algorithm can avoid computing the tree edit distance that is not needed between a pair of XML data, and reduce the approximate computation complexity. So, foundations are built for researching XML approximately duplicated data cleaning.

     

  • loading
  • [1] Rahm E, Do H H.Data cleaning:problems and current approaches[J].IEEE Data Engineer Bulletin, 2000, 23(4):3~13 [2]Galhardas H, Florescu D, Shasha D,et al. Declarative data cleaning:language,model,and algorithms . In:Apers P, Atzeni P,Ceri S,eds.Proceedings of the 27th VLDB Conference . Roma:Morgan Kaufmann, 2001.371~380 [3]Monge A E.Matching algorithms within a duplicate detection system[J].IEEE Data Engineer Bulletin, 2000,23(4):14~20 [4]Zhang K,Shasha D. Tree pattern matching[M]. London:Oxford Univesity Press,1997 [5]Guha S, Jagadish H V, Koudas N,et al. Approximate XML joins . In:Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data .Madison:ACM Press,2002
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views(3672) PDF downloads(939) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return