Efficient Top-k Frequent Itemset Mining on Massive Data

被引:0
|
作者
Wan, Xiaolong [1 ]
Han, Xixian [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, 92 Xidazhi St, Harbin, Heilongjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Top-k frequent itemset mining; PTF; Prefix-based partitioning; Hybrid vertical storage; Candidate pruning; PATTERNS; ALGORITHM; CONSTRAINTS; SUPPORT;
D O I
10.1007/s41019-024-00241-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Top-k frequent itemset mining (top-k FIM) plays an important role in many practical applications. It reports the k itemsets with the highest supports. Rather than the subtle minimum support threshold specified in FIM, top-k FIM only needs the more understandable parameter of the result number. The existing algorithms require at least two passes of scan on the table, and incur high execution cost on massive data. This paper develops a prefix-partitioning-based PTF algorithm to mine top-k frequent itemsets efficiently, where each prefix-based partition keeps the transactions sharing the same prefix item. PTF can skip most of the partitions directly which cannot generate any top-k frequent itemsets. Vertical mining is developed to process the partitions of vertical representation with the high-support-first principle, and only a small fraction of the items are involved in the processing of the partitions. Two improvements are proposed to reduce execution cost further. Hybrid vertical storage mode maintains the prefix-based partitions adaptively and the candidate pruning reduces the number of the explored candidates. The extensive experimental results show that, on massive data, PTF can achieve up to 1348.53 times speedup ratio and involve up to 355.31 times less I/O cost compared with the state-of-the-art algorithms.
引用
收藏
页码:177 / 203
页数:27
相关论文
共 50 条
  • [1] Efficient top-k high utility itemset mining on massive data
    Han, Xixian
    Liu, Xianmin
    Li, Jianzhong
    Gao, Hong
    [J]. INFORMATION SCIENCES, 2021, 557 : 382 - 406
  • [2] RETRACTED: Mining Top-K frequent closed itemset in data streams (Retracted Article)
    Li, Jun
    Hou, Xiuhong
    Gong, Sen
    [J]. 2011 INTERNATIONAL CONFERENCE ON ENERGY AND ENVIRONMENTAL SCIENCE-ICEES 2011, 2011, 11
  • [3] TKFIM: Top-K frequent itemset mining technique based on equivalence classes
    Iqbal, Saood
    Shahid, Abdul
    Roman, Muhammad
    Khan, Zahid
    Al-Otaibi, Shaha
    Yu, Lisu
    [J]. PEERJ COMPUTER SCIENCE, 2021, : 1 - 27
  • [4] TKG: Efficient Mining of Top-K Frequent Subgraphs
    Fournier-Viger, Philippe
    Cheng, Chao
    Lin, Jerry Chun-Wei
    Yun, Unil
    Kiran, R. Uday
    [J]. BIG DATA ANALYTICS (BDA 2019), 2019, 11932 : 209 - 226
  • [5] Efficient Top-k Retrieval on Massive Data
    Han, Xixian
    Li, Jianzhong
    Gao, Hong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (10) : 2687 - 2699
  • [6] An effective scheme for top-k frequent itemset mining under differential privacy conditions
    Wenjuan LIANG
    Hong CHEN
    Jing ZHANG
    Dan ZHAO
    Cuiping LI
    [J]. Science China(Information Sciences), 2020, 63 (05) : 200 - 202
  • [7] Efficient Top-k Retrieval on Massive Data
    Han, Xixian
    Li, Jianzhong
    Gao, Hong
    [J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1496 - 1497
  • [8] An effective scheme for top-k frequent itemset mining under differential privacy conditions
    Wenjuan Liang
    Hong Chen
    Jing Zhang
    Dan Zhao
    Cuiping Li
    [J]. Science China Information Sciences, 2020, 63
  • [9] An effective scheme for top-k frequent itemset mining under differential privacy conditions
    Liang, Wenjuan
    Chen, Hong
    Zhang, Jing
    Zhao, Dan
    Li, Cuiping
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2020, 63 (05)
  • [10] Efficient algorithms of mining top-k frequent closed itemsets
    Lan Yongjie
    Qiu Yong
    [J]. ICEMI 2007: PROCEEDINGS OF 2007 8TH INTERNATIONAL CONFERENCE ON ELECTRONIC MEASUREMENT & INSTRUMENTS, VOL II, 2007, : 551 - 554