一种有效压缩频繁模式挖掘的算法

童咏昕; 马世龙; 李 钰

一种有效压缩频繁模式挖掘的算法

1.
北京航空航天大学软件学院, 北京 100191
2. 北京航空航天大学计算机学院, 北京 100191

基金项目:

国家973计划资助项目(2005CB321902)

详细信息

作者简介:
童咏昕(1982-),男,北京人,硕士生,yxtong@nlsde.buaa.edu.cn.

中图分类号: TP 311.13
计量
- 文章访问数: 3697
- HTML全文浏览量: 316
- PDF下载量: 1365
- 被引次数: 0
出版历程
- 收稿日期: 2008-08-10
- 网络出版日期: 2009-05-31

Effective algorithm for mining compressed frequent patterns

1.
School of Software, Beijing University of Aeronautics and Astronautics, Beijing 100191, China
2. School of Computer Science and Technology, Beijing University of Aeronautics and Astronautics, Beijing 100191, China

摘要

摘要: 频繁模式挖掘的研究最近致力于在一个合理的容错范围内寻找有代表性的模式来压缩庞大的挖掘结果集.一种新型启发式算法AMSA(Approximating Mining based Simulated Annealing)被提出,其采用了模拟退火思想来保证有效性和压缩的质量.依据FIMI(Frequent Itemset Mining Implementations Repository)提供的公用数据集进行的实验结果也证明了这一结论.通过与FPclose算法和RPglobal算法分别进行了性能的比较,AMSA挖掘的结果集规模小于FPclose算法和RPglobal算法得到的结果集规模,特别是当支持度阈值很低时,RPglobal不可在合理时间内产生结果集,AMSA却可在合理时间内得出较精准的结果集.
- 数据挖掘 /
- 模拟退火 /
- 启发式方法
Abstract: Researches of frequent-pattern mining have recently focused on discovering representative patterns to compress a large of results within a reasonable tolerance bound. A novel heuristic algorithm, approximating mining based simulated annealing (AMSA), was proposed. The algorithm uses a method based simulated-annealing to improve efficiency and quality of the compression. Our experimental studies demonstrate the algorithm is efficient and high quality on a common dataset supported by frequent itemset mining implementations repository (FIMI). The mining result of AMSA is smaller than mining results of FPclose and RPglobal by performance study. Especially, if min_sup threshold is low, RPglobal fails to generate any result within reasonable time range, while AMSA generates a concise and succinct mining result.
- data mining /
- simulated annealing /
- heuristic method

HTML全文

参考文献(1)

[1] Agrawal R, Srikant R. Fast algorithms for mining association rules Proc of 1994 Int conf on VLDB. Santiago, Chile: VLDB, 1994:487-499 [2] Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation Proc of the 2000 ACM SIGMOD. Dallas, USA: ACM ,2000:1-12 [3] Pasquier N, Bastide Y, Taouil R, et al. Discovering frequent closed itemsets for association rules Proc of the 7^th ICDT. Jerusalem, Israel: IEEE, 1999:134-145 [4] Wang J, Han J, Pei J. CLOSET+: Searching for the best strategies for mining frequent closed Itemsets Proc of the 2003 ACM SIGKDD. Washington DC, USA: ACM, 2003:236-245 [5] Grahne G, Zhu J. Efficiently using prefix-trees in mining frequent itemsets Proc of IEEE ICDM Workshop on FIMI. Melbourne, FL: IEEE, 2003:123-132 [6] Bayardo R. Efficiently mining long patterns from databases Proc of the ACM SIGMOD. Seattle, USA: ACM, 1998:85-93 [7] Afrati F N, Gionis A, Mannila H. Approximating a collection of frequent sets Proc of the 2004 ACM SIGKDD. Seattle, USA: ACM, 2004:12-19 [8] Xin D, Han J, Yan Y, et al. Mining compressed frequent-pattern sets Proc of the 2005 VLDB. Trondheim, Norway: VLDB, 2005:709-720

施引文献

资源附件(0)

访问统计