Maximal frequent itemsets mining algorithm based on effective pruning mechanisms
-
摘要: 对关联挖掘中的最大频繁项集挖掘问题进行了研究,提出了一种基于项集格修剪机制的最大频繁项集挖掘算法.采用项集格生成树的数据结构,将最大频繁项集挖掘过程转化为对项集格生成树进行深度优先搜索获取所有最大频繁节点的过程. 其中提高算法效率的一个重要措施是在遍历项集格生成树的过程中对生成树进行修剪.给出了项集格生成树的三个性质,并在此基础上提出了直接超集修剪、间接超集修剪与事务集等价修剪三种修剪机制,尽可能忽略非频繁节点及其所生成的扩展节点以减少遍历的节点数目.试验结果表明,三种修剪机制都能够有效地减少搜索空间,其中事务集等价修剪机制的效果最好,算法的性能与输入数据集的稠密程度相关.Abstract: The maximal frequent itemsets mining problem was studied and an algorithm based on pruning itemset lattice effectively was proposed. The itemset lattice tree data structure was adopted to translate maximal frequent itemsets mining into the process of depth-first searching the itemset lattice tree. One of the key measures to promote performance of the algorithm is to prune the itemset lattice tree while traversing it. Three properties of itemset lattice tree were given and three pruning mechanisms, direct superset pruning, indirect superset pruning and transaction sets equivalence pruning, were proposed based on them respectively to prune the infrequent nodes and their extension nodes to reduce the number of nodes while traversing the itemset lattice tree. Test results indicate that all the three pruning mechanisms can reduce the search space effectively and the transaction sets equivalence pruning has the best effect on performance of the algorithm. Test results also indicate that performance of the algorithm is related to denseness of the datasets.
-
Key words:
- data mining /
- association rule /
- association mining /
- lattice
-
[1] Agrawal R, Imielinski, Swami A, et al. Mining association rules between sets of items in large databases . In:Peter Buneman, Sushil Jajodia, eds. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data . Washington, 1993. 207~216 [2] Agrawal R, Srikant R. Fast algorithms for mining association rules in large database . FJ9839, 1994 [3] Houtsma M, Swami A. Set-oriented mining of association rules . In:Philip S Yu, Arbee L P Chen. eds. Proceedings of the 11th International Conference on Data Engineering . Taipei, 1995. 25~33 [4] Zaki M, Ogihara M. Theoretical foundations of association rules . In:3rd ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery . Washington, 1998. 7.1~7:8 [5] Wille R. Restructuring lattice theory:an approach based on hierarchies of concepts . In:Ivan Rival,ed. Ordered Sets . Reidel, Dordrecht-Boston, 1982. 445~470 [6] Agrawal R, Srikant R. Fast algorithms for mining association rules in large database . In:Jorge B Bocca, Matthias Jarke Carlo Zaniolo, eds. Proceedings of the 20th International Conference on Very Large Data Bases . Santiago,1994. 487~499 [7] Han J, Fu Y. Discovery of multiple-level association rules from large databases . In:Jorge B Bocca, Matthias Jarke, Carlo Zaniolo, eds. Proceedings of 21th International Conference on Very Large Data Bases . Zurich, rland, 1995.39~46 [8] Bayardo R J. Efficiently mining long patterns from databases . In:Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data . Seattle, Washington, 1998. 85~93
点击查看大图
计量
- 文章访问数: 3093
- HTML全文浏览量: 153
- PDF下载量: 1203
- 被引次数: 0