Efficiently Mining Frequent Itemsets on Massive Data

被引:13
|
作者
Han, Xixian [1 ]
Liu, Xianmin [1 ]
Chen, Jian [1 ]
Lai, Guojun [1 ]
Gao, Hong [1 ]
Li, Jianzhong [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Heilongjiang, Peoples R China
来源
IEEE ACCESS | 2019年 / 7卷
基金
中国国家自然科学基金;
关键词
Frequent itemset mining; massive data; PFIM algorithm; pruning rule; incremental update; ALGORITHMS; PATTERNS;
D O I
10.1109/ACCESS.2019.2902602
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent itemset mining is an important operation to return all itemsets in the transaction table, which occur as a subset of at least a specified fraction of the transactions. The existing algorithms cannot compute frequent itemsets on massive data efficiently, since they either require multiple-pass scans on the table or construct complex data structures which normally exceed the available memory on massive data. This paper proposes a novel precomputation-based frequent itemset mining (PFIM) algorithm to compute the frequent itemsets quickly on massive data. PFIM treats the transaction table as two parts: the large old table storing historical data and the relatively small new table storing newly generated data. PFIM first preconstructs the quasi-frequent itemsets on the old table whose supports are above the lower-bound of the practical support level. Given the specified support threshold, PFIM can quickly return the required frequent itemsets on the table by utilizing the quasi-frequent itemsets. Three pruning rules are presented to reduce the size of the involved candidates. An incremental update strategy is devised to efficiently re-construct the quasi-frequent itemsets when the tables are merged. The extensive experimental results, conducted on synthetic and real-life data sets, show that PFIM has a significant advantage over the existing algorithms and runs two orders of magnitude faster than the latest algorithm.
引用
收藏
页码:31409 / 31421
页数:13
相关论文
共 50 条
  • [1] Efficiently mining maximal frequent itemsets
    Gouda, K
    Zaki, MJ
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 163 - 170
  • [2] An adaptive approach to mining frequent itemsets efficiently
    Tseng, Fan-Chen
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (18) : 13166 - 13172
  • [3] EFFICIENTLY MINING FREQUENT ITEMSETS IN TRANSACTIONAL DATABASES
    Alghyaline, Salah
    Hsieh, Jun-Wei
    Lai, Jim Z. C.
    [J]. JOURNAL OF MARINE SCIENCE AND TECHNOLOGY-TAIWAN, 2016, 24 (02): : 184 - 191
  • [4] Efficiently Mining Maximal Diverse Frequent Itemsets
    Wu, Dingming
    Luo, Dexin
    Jensen, Christian S.
    Huang, Joshua Zhexue
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT II, 2019, 11447 : 191 - 207
  • [5] SuffixMiner: Efficiently mining frequent itemsets in data streams by suffix-forest
    Jia, LF
    Zhou, CG
    Wang, Z
    Xu, XJ
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 592 - 595
  • [6] EFFICIENTLY USING PRIME-ENCODING FOR MINING FREQUENT ITEMSETS IN SPARSE DATA
    Gouda, Karam
    Hassaan, Mosab
    [J]. COMPUTING AND INFORMATICS, 2013, 32 (05) : 1079 - 1099
  • [7] Efficiently mining maximal frequent itemsets based on digraph
    Ren, Zhibo
    Zhang, Qiang
    Ma, Xiujuan
    [J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 140 - +
  • [8] Efficiently mining frequent itemsets with weight and recency constraints
    Jerry Chun-Wei Lin
    Wensheng Gan
    Philippe Fournier-Viger
    Han-Chieh Chao
    Tzung-Pei Hong
    [J]. Applied Intelligence, 2017, 47 : 769 - 792
  • [9] Efficiently mining frequent itemsets applied for textual aggregation
    Mustapha Bouakkaz
    Youcef Ouinten
    Sabine Loudcher
    Philippe Fournier-Viger
    [J]. Applied Intelligence, 2018, 48 : 1013 - 1019
  • [10] Efficiently mining frequent itemsets applied for textual aggregation
    Bouakkaz, Mustapha
    Ouinten, Youcef
    Loudcher, Sabine
    Fournier-Viger, Philippe
    [J]. APPLIED INTELLIGENCE, 2018, 48 (04) : 1013 - 1019