Effective algorithm for mining compressed frequent patterns
-
摘要: 频繁模式挖掘的研究最近致力于在一个合理的容错范围内寻找有代表性的模式来压缩庞大的挖掘结果集.一种新型启发式算法AMSA(Approximating Mining based Simulated Annealing)被提出,其采用了模拟退火思想来保证有效性和压缩的质量.依据FIMI(Frequent Itemset Mining Implementations Repository)提供的公用数据集进行的实验结果也证明了这一结论.通过与FPclose算法和RPglobal算法分别进行了性能的比较,AMSA挖掘的结果集规模小于FPclose算法和RPglobal算法得到的结果集规模,特别是当支持度阈值很低时,RPglobal不可在合理时间内产生结果集,AMSA却可在合理时间内得出较精准的结果集.Abstract: Researches of frequent-pattern mining have recently focused on discovering representative patterns to compress a large of results within a reasonable tolerance bound. A novel heuristic algorithm, approximating mining based simulated annealing (AMSA), was proposed. The algorithm uses a method based simulated-annealing to improve efficiency and quality of the compression. Our experimental studies demonstrate the algorithm is efficient and high quality on a common dataset supported by frequent itemset mining implementations repository (FIMI). The mining result of AMSA is smaller than mining results of FPclose and RPglobal by performance study. Especially, if min_sup threshold is low, RPglobal fails to generate any result within reasonable time range, while AMSA generates a concise and succinct mining result.
-
Key words:
- data mining /
- simulated annealing /
- heuristic method
-
[1] Agrawal R, Srikant R. Fast algorithms for mining association rules Proc of 1994 Int conf on VLDB. Santiago, Chile: VLDB, 1994:487-499 [2] Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation Proc of the 2000 ACM SIGMOD. Dallas, USA: ACM ,2000:1-12 [3] Pasquier N, Bastide Y, Taouil R, et al. Discovering frequent closed itemsets for association rules Proc of the 7th ICDT. Jerusalem, Israel: IEEE, 1999:134-145 [4] Wang J, Han J, Pei J. CLOSET+: Searching for the best strategies for mining frequent closed Itemsets Proc of the 2003 ACM SIGKDD. Washington DC, USA: ACM, 2003:236-245 [5] Grahne G, Zhu J. Efficiently using prefix-trees in mining frequent itemsets Proc of IEEE ICDM Workshop on FIMI. Melbourne, FL: IEEE, 2003:123-132 [6] Bayardo R. Efficiently mining long patterns from databases Proc of the ACM SIGMOD. Seattle, USA: ACM, 1998:85-93 [7] Afrati F N, Gionis A, Mannila H. Approximating a collection of frequent sets Proc of the 2004 ACM SIGKDD. Seattle, USA: ACM, 2004:12-19 [8] Xin D, Han J, Yan Y, et al. Mining compressed frequent-pattern sets Proc of the 2005 VLDB. Trondheim, Norway: VLDB, 2005:709-720
点击查看大图
计量
- 文章访问数: 3293
- HTML全文浏览量: 197
- PDF下载量: 1246
- 被引次数: 0