北京航空航天大学学报 ›› 2008, Vol. 34 ›› Issue (8): 969-972.

• 论文 • 上一篇    下一篇

数据流频繁闭集的在线挖掘

刘春,郑征,蔡开元,张师超   

  1. 北京航空航天大学 自动化科学与电气工程学院, 北京 100191
  • 收稿日期:2007-07-20 出版日期:2008-08-31 发布日期:2010-09-17
  • 作者简介:刘 春(1982-),男,河南信阳人,硕士生,liuchun@asee.buaa.edu.cn.
  • 基金资助:

    国家自然科学基金资助项目(60633010);中国博士后基金资助项目(20070410453)

Online mining frequent closed itemsets over data stream

Liu Chun, Zheng Zheng, Cai Kaiyuan, Zhang Shichao   

  1. School of Automation Science and Electrical Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100191, China
  • Received:2007-07-20 Online:2008-08-31 Published:2010-09-17

摘要: 基于算法LossyCounting,提出了数据流频繁闭集的在线挖掘算法LC_Closed(LossyCounting_Closed).设计了基于前缀树的频繁闭集压缩存储结构CI-forest(ClosedItemsets-forest),利用该数据结构可以快速的插入和查询闭集模式,且在处理新的事务数据时能够快速定位相关的历史闭集模式.该算法采用在线的处理方式,提高了算法的实时性.实验的结果证明该算法是有效的.

Abstract: Based on the algorithm LossCounting, a novel approach called LossyCounting_Closed(LC_Closed ) for mining closed frequent itemsets over data stream was proposed. A new summary data structure called Closed-Itemsets-forest (CI-forest) was developed for maintaining only closed frequent itemsets.The insertion and query of closed itemsets can be rapidly made based on the data structure CI-forest, and the location of the associated historical closed itemsets in the stage of dealing with the new transaction is also facilitated by CI-forest. Since the algorithm maintains closed itemsets online, the current closed frequent itemsets can be output in real time based on user-s specified thresholds. The effectiveness of the proposed method is shown in the experimental results.

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发