Efficient algorithms for deriving complete frequent itemsets from frequent closed itemsets

被引:4
|
作者
Wu, Cheng-Wei [1 ]
Huang, JianTao [1 ]
Lin, Yun-Wei [1 ]
Chuang, Chien-Yu [1 ]
Tseng, Yu-Chee [2 ]
机构
[1] Natl Ilan Univ, Yilan, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Yilan, Taiwan
关键词
Frequent itemset mining; Frequent closed itemset mining; Lossless and condensed representation; Deriving algorithms;
D O I
10.1007/s10489-020-02172-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When mining frequent itemsets (abbr. FIs) from dense datasets, it usually produces too many itemsets and results in the mining task to suffer from a very long execution time and high memory consumption. Frequent closed itemset (abbr. FCI) is a compact and lossless representation of FI. Mining FCIs can not only reduce the execution time and memory usage, but also reserve the complete information of FIs derived from FCIs. Although many studies have been proposed with various efficient methods for mining FCIs, few of them have developed algorithms for efficiently deriving FIs from FCIs. In this work, we propose two efficient algorithms named DFI-List and DFI-Growth for efficiently deriving FIs from FCIs. The both algorithms adopt depth-first search and divide-and-conquer methodology to derive all the FIs. DFI-List efficiently derives all the FIs with a vertical index structure called Cid List. DFI-Growth compresses the information of FCIs into tree structures and applies pattern-growth strategy to derive FIs from the trees. Empirical experiments show that DFI-List is the most efficient and scalable algorithm on the dense datasets. For example, when the minimum support threshold is set to 50% on the Chess dataset, DFI-List runs faster than LevelWise (Pasquier et al. Inf Syst 24(1): 25-46, 1999b) over 100 times. As for DFI-Growth, it is the most stable and memory efficient algorithm on the sparse datasets. Both DFI-Growth and DFI-List are superior to the state-of-the-art algorithm (Pasquier et al. Inf Syst 24(1): 25-46, 199b) in terms of execution time.
引用
下载
收藏
页码:7002 / 7023
页数:22
相关论文
共 50 条
  • [31] Research on an algorithm for mining frequent closed itemsets
    Zhu, Yuquan
    Song, Yuqing
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2007, 44 (07): : 1177 - 1183
  • [32] Frequent closed itemsets mining using ITBitree
    Ren, Jiadong
    Song, Wei
    Yu, Shiying
    International Journal of Advancements in Computing Technology, 2012, 4 (17) : 271 - 279
  • [33] Mining frequent closed itemsets for large data
    Fu, HG
    Nguifo, EM
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA'04), 2004, : 328 - 335
  • [34] Discovering frequent closed itemsets for association rules
    Pasquier, N
    Bastide, Y
    Taouil, R
    Lakhal, L
    DATABASE THEORY - ICDT'99, 1999, 1540 : 398 - 416
  • [35] Improved algorithm for mining frequent closed itemsets
    Song, Wei
    Yang, Bingru
    Xu, Zhangyan
    Gao, Jing
    2008, Science Press, 18,Shuangqing Street,Haidian, Beijing, 100085, China (45):
  • [36] Mining frequent closed itemsets out of core
    Lucchese, Claudio
    Orlando, Salvatore
    Perego, Raffaele
    PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 419 - +
  • [37] Automatic discovery of locally frequent itemsets in the presence of highly frequent itemsets
    Bodon, Ferenc
    Kouris, Ioannis N.
    Makris, Christos H.
    Tsakalidis, Athanasios K.
    INTELLIGENT DATA ANALYSIS, 2005, 9 (01) : 83 - 104
  • [38] Associative classification based on closed frequent itemsets
    Li, X.-M., 1600, Univ. of Electronic Science and Technology of China (41):
  • [39] New algorithm of mining frequent closed itemsets
    School of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China
    J. Southeast Univ. Engl. Ed., 2008, 3 (335-338):
  • [40] Efficient mining of frequent itemsets from data streams
    Leung, Carson Kai-Sang
    Brajczuk, Dale A.
    SHARING DATA, INFORMATION AND KNOWLEDGE, PROCEEDINGS, 2008, 5071 : 2 - 14