TFP: An efficient algorithm for mining top-K frequent closed itemsets

被引:124
|
作者
Wang, JY [1 ]
Han, JW
Lu, Y
Tzvetkov, P
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Siebel Ctr Sci 2132, Urbana, IL 61801 USA
基金
美国国家科学基金会;
关键词
data mining; frequent itemset; association rules; mining methods and algorithms;
D O I
10.1109/TKDE.2005.81
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min_support threshold and aim at mining a complete set of frequent itemsets satisfying min_support. However, in practice, it is difficult for users to provide an appropriate min_support threshold. In addition, a complete set of frequent itemsets is much less compact than a set of frequent closed itemsets. In this paper, we propose an alternative mining task: mining top-k frequent closed itemsets of length no less than min_l, where k is the desired number of frequent closed itemsets to be mined, and min_l is the minimal length of each itemset. An efficient algorithm, called TFP, is developed for mining such itemsets without mins_support. Starting at min_support = 0 and by making use of the length constraint and the properties of top-k frequent closed itemsets, min_support can be raised effectively and FP-Tree can be pruned dynamically both during and after the construction of the tree using our two proposed methods: the closed node count and descendant_sum. Moreover, mining is further speeded up by employing a top-down and bottom-up combined FP-Tree traversing strategy, a set of search space pruning methods, a fast 2-level hash-indexed result tree, and a novel closed itemset verification scheme. Our extensive performance study shows that TFP has high performance and linear scalability in terms of the database size.
引用
收藏
页码:652 / 664
页数:13
相关论文
共 50 条
  • [31] Parallel mining of top-k frequent itemsets in very large text database
    Wang, YH
    Jia, Y
    Yang, SQ
    [J]. ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2005, 3739 : 706 - 712
  • [32] Mining top-k frequent-regular closed patterns
    Amphawan, Komate
    Lenca, Philippe
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (21) : 7882 - 7894
  • [33] Mining top-K closed itemsets using best-first search
    Songram, Panida
    Boonjing, Veera
    [J]. 2008 IEEE 8TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2008, : 77 - 82
  • [34] An Efficient Frequent Closed Itemsets Mining Algorithm Over Data Streams
    Tan, Jun
    Bu, Yingyong
    Yang, Bo
    [J]. 2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT, INNOVATION MANAGEMENT AND INDUSTRIAL ENGINEERING, VOL 3, PROCEEDINGS, 2009, : 65 - +
  • [35] An Efficient Frequent Closed Itemsets Mining Algorithm Over Data Streams
    Tan, Jun
    Yu, Shao-jun
    [J]. 2011 SECOND INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND EDUCATION APPLICATION (ICEA 2011), 2011, : 197 - 201
  • [36] Efficient Data Streams Based Closed Frequent Itemsets Mining Algorithm
    Tan, Jun
    [J]. ADVANCES IN CIVIL ENGINEERING II, PTS 1-4, 2013, 256-259 : 2910 - 2913
  • [37] An Efficient Algorithm for Mining Frequent Closed Itemsets over Data Stream
    Li Guodong
    Xia Kewen
    [J]. NEW TRENDS IN MECHATRONICS AND MATERIALS ENGINEERING, 2012, 151 : 570 - 575
  • [38] Mining Top-k Frequent-regular Itemsets from Incremental Transactional Database
    Tagmatcha, Bandit
    Amphawan, Komate
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS (ICAICTA 2018), 2018, : 231 - 237
  • [39] An efficient algorithm for mining top-k closed frequent item sets over data streams over data streams
    Yimin, Mao
    Xiaofang, Xue
    Jinqing, Chen
    [J]. Telkomnika - Indonesian Journal of Electrical Engineering, 2013, 11 (07): : 3759 - 3766
  • [40] Research on an algorithm for mining frequent closed itemsets
    Zhu, Yuquan
    Song, Yuqing
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2007, 44 (07): : 1177 - 1183