北京航空航天大学学报 ›› 2006, Vol. 32 ›› Issue (02): 218-223.

• 论文 • 上一篇    下一篇

一种基于有效修剪的最大频繁项集挖掘算法

陈鹏, 吕卫锋   

  1. 北京航空航天大学 计算机学院, 北京 100083
  • 收稿日期:2005-01-10 出版日期:2006-02-28 发布日期:2010-09-20
  • 作者简介:陈 鹏(1974-),男,湖南郴州人,博士生, pchen@nlsde.buaa.edu.cn.
  • 基金资助:

    国家重点基础研究发展规划资助项目(G1999032709);国家自然科学基金资助项目(90104008)

Maximal frequent itemsets mining algorithm based on effective pruning mechanisms

Chen Peng, Lü Weifeng   

  1. School of Computer Science and Technology, Beijing University of Aeronautics and Astronautics, Beijing 100083, China
  • Received:2005-01-10 Online:2006-02-28 Published:2010-09-20

摘要: 对关联挖掘中的最大频繁项集挖掘问题进行了研究,提出了一种基于项集格修剪机制的最大频繁项集挖掘算法.采用项集格生成树的数据结构,将最大频繁项集挖掘过程转化为对项集格生成树进行深度优先搜索获取所有最大频繁节点的过程. 其中提高算法效率的一个重要措施是在遍历项集格生成树的过程中对生成树进行修剪.给出了项集格生成树的三个性质,并在此基础上提出了直接超集修剪、间接超集修剪与事务集等价修剪三种修剪机制,尽可能忽略非频繁节点及其所生成的扩展节点以减少遍历的节点数目.试验结果表明,三种修剪机制都能够有效地减少搜索空间,其中事务集等价修剪机制的效果最好,算法的性能与输入数据集的稠密程度相关.

Abstract: The maximal frequent itemsets mining problem was studied and an algorithm based on pruning itemset lattice effectively was proposed. The itemset lattice tree data structure was adopted to translate maximal frequent itemsets mining into the process of depth-first searching the itemset lattice tree. One of the key measures to promote performance of the algorithm is to prune the itemset lattice tree while traversing it. Three properties of itemset lattice tree were given and three pruning mechanisms, direct superset pruning, indirect superset pruning and transaction sets equivalence pruning, were proposed based on them respectively to prune the infrequent nodes and their extension nodes to reduce the number of nodes while traversing the itemset lattice tree. Test results indicate that all the three pruning mechanisms can reduce the search space effectively and the transaction sets equivalence pruning has the best effect on performance of the algorithm. Test results also indicate that performance of the algorithm is related to denseness of the datasets.

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发