Efficient algorithms for deriving complete frequent itemsets from frequent closed itemsets

被引:4
|
作者
Wu, Cheng-Wei [1 ]
Huang, JianTao [1 ]
Lin, Yun-Wei [1 ]
Chuang, Chien-Yu [1 ]
Tseng, Yu-Chee [2 ]
机构
[1] Natl Ilan Univ, Yilan, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Yilan, Taiwan
关键词
Frequent itemset mining; Frequent closed itemset mining; Lossless and condensed representation; Deriving algorithms;
D O I
10.1007/s10489-020-02172-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When mining frequent itemsets (abbr. FIs) from dense datasets, it usually produces too many itemsets and results in the mining task to suffer from a very long execution time and high memory consumption. Frequent closed itemset (abbr. FCI) is a compact and lossless representation of FI. Mining FCIs can not only reduce the execution time and memory usage, but also reserve the complete information of FIs derived from FCIs. Although many studies have been proposed with various efficient methods for mining FCIs, few of them have developed algorithms for efficiently deriving FIs from FCIs. In this work, we propose two efficient algorithms named DFI-List and DFI-Growth for efficiently deriving FIs from FCIs. The both algorithms adopt depth-first search and divide-and-conquer methodology to derive all the FIs. DFI-List efficiently derives all the FIs with a vertical index structure called Cid List. DFI-Growth compresses the information of FCIs into tree structures and applies pattern-growth strategy to derive FIs from the trees. Empirical experiments show that DFI-List is the most efficient and scalable algorithm on the dense datasets. For example, when the minimum support threshold is set to 50% on the Chess dataset, DFI-List runs faster than LevelWise (Pasquier et al. Inf Syst 24(1): 25-46, 1999b) over 100 times. As for DFI-Growth, it is the most stable and memory efficient algorithm on the sparse datasets. Both DFI-Growth and DFI-List are superior to the state-of-the-art algorithm (Pasquier et al. Inf Syst 24(1): 25-46, 199b) in terms of execution time.
引用
下载
收藏
页码:7002 / 7023
页数:22
相关论文
共 50 条
  • [41] Mining frequent closed itemsets using conditional frequent pattern tree
    Singh, SR
    Patra, BK
    Giri, D
    Proceedings of the IEEE INDICON 2004, 2004, : 501 - 504
  • [42] Parametric Algorithms for Mining Share Frequent Itemsets
    Brock Barber
    HOWARD J. HAMILTON
    Journal of Intelligent Information Systems, 2001, 16 : 277 - 293
  • [43] FCHUIM: Efficient Frequent and Closed High-Utility Itemsets Mining
    Wei, Tianyou
    Wang, Bin
    Zhang, Yuntian
    Hu, Keyong
    Yao, Yinfeng
    Liu, Hao
    IEEE ACCESS, 2020, 8 : 109928 - 109939
  • [44] An Efficient Algorithm for Mining Frequent Closed Itemsets over Data Stream
    Li Guodong
    Xia Kewen
    NEW TRENDS IN MECHATRONICS AND MATERIALS ENGINEERING, 2012, 151 : 570 - 575
  • [45] GRG: An efficient method for association rules mining on frequent closed itemsets
    Li, L
    Zhai, DH
    Jin, F
    PROCEEDINGS OF THE 2003 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL, 2003, : 854 - 859
  • [46] Reference itemsets: useful itemsets to approximate the representation of frequent itemsets
    Huang, Jheng-Nan
    Hong, Tzung-Pei
    Chiang, Ming-Chao
    SOFT COMPUTING, 2017, 21 (20) : 6143 - 6157
  • [47] Reference itemsets: useful itemsets to approximate the representation of frequent itemsets
    Jheng-Nan Huang
    Tzung-Pei Hong
    Ming-Chao Chiang
    Soft Computing, 2017, 21 : 6143 - 6157
  • [48] Efficient incremental mining of top-K frequent closed itemsets
    Pietracaprina, Andrea
    Vandin, Fabio
    DISCOVERY SCIENCE, PROCEEDINGS, 2007, 4755 : 275 - +
  • [49] An Efficient Frequent Closed Itemsets Mining Algorithm Over Data Streams
    Tan, Jun
    Bu, Yingyong
    Yang, Bo
    2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT, INNOVATION MANAGEMENT AND INDUSTRIAL ENGINEERING, VOL 3, PROCEEDINGS, 2009, : 65 - +
  • [50] Efficient Data Streams Based Closed Frequent Itemsets Mining Algorithm
    Tan, Jun
    ADVANCES IN CIVIL ENGINEERING II, PTS 1-4, 2013, 256-259 : 2910 - 2913