Online mining frequent closed itemsets over data stream
-
摘要: 基于算法LossyCounting,提出了数据流频繁闭集的在线挖掘算法LC_Closed(LossyCounting_Closed).设计了基于前缀树的频繁闭集压缩存储结构CI-forest(ClosedItemsets-forest),利用该数据结构可以快速的插入和查询闭集模式,且在处理新的事务数据时能够快速定位相关的历史闭集模式.该算法采用在线的处理方式,提高了算法的实时性.实验的结果证明该算法是有效的.Abstract: Based on the algorithm LossCounting, a novel approach called LossyCounting_Closed(LC_Closed ) for mining closed frequent itemsets over data stream was proposed. A new summary data structure called Closed-Itemsets-forest (CI-forest) was developed for maintaining only closed frequent itemsets.The insertion and query of closed itemsets can be rapidly made based on the data structure CI-forest, and the location of the associated historical closed itemsets in the stage of dealing with the new transaction is also facilitated by CI-forest. Since the algorithm maintains closed itemsets online, the current closed frequent itemsets can be output in real time based on user-s specified thresholds. The effectiveness of the proposed method is shown in the experimental results.
-
Key words:
- data mining /
- data stream /
- frequent closed itemsets /
- online
-
[1] Wang J, Han J, Pei J. CLOSET+: searching for the best strategies for mining frequent closed itemsets SIGKDD-03.WDC,USA:ACM,2003: 236-245 [2] Babcock B, Babu S, Datar M, et al. Models and issues in data stream systems ACM PODS-02. Madison, USA: ACM, 2002:1-16 [3] Manku G,Motw R.Approximate frequency counts over data streams Proc 28th Int Conf of VLDB. Hongkong, China: Morgan Kanfmann, 2002: 346-357 [4] Gosta G, Zhu J. Efficiently using prefix-trees in mining frequent itemsets Proc of IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI-03). 2003
点击查看大图
计量
- 文章访问数: 3392
- HTML全文浏览量: 212
- PDF下载量: 1001
- 被引次数: 0