Mining top-K frequent itemsets through progressive sampling

被引:28
|
作者
Pietracaprina, Andrea [2 ]
Riondato, Matteo [1 ]
Upfal, Eli [1 ]
Vandin, Fabio [1 ]
机构
[1] Brown Univ, Dept Comp Sci, Providence, RI 02912 USA
[2] Univ Padua, Dipartimento Ingn Informaz, Padua, Italy
基金
美国国家科学基金会;
关键词
Sampling; Top-K frequent itemsets; Frequent itemsets mining; Bloom filters; Progressive sampling;
D O I
10.1007/s10618-010-0185-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the use of sampling for efficiently mining the top-K frequent itemsets of cardinality at most w. To this purpose, we define an approximation to the top-K frequent itemsets to be a family of itemsets which includes (resp., excludes) all very frequent (resp., very infrequent) itemsets, together with an estimate of these itemsets' frequencies with a bounded error. Our first result is an upper bound on the sample size which guarantees that the top-K frequent itemsets mined from a random sample of that size approximate the actual top-K frequent itemsets, with probability larger than a specified value. We show that the upper bound is asymptotically tight when w is constant. Our main algorithmic contribution is a progressive sampling approach, combined with suitable stopping conditions, which on appropriate inputs is able to extract approximate top-K frequent itemsets from samples whose sizes are smaller than the general upper bound. In order to test the stopping conditions, this approach maintains the frequency of all itemsets encountered, which is practical only for small w. However, we show how this problem can be mitigated by using a variation of Bloom filters. A number of experiments conducted on both synthetic and real benchmark datasets show that using samples substantially smaller than the original dataset (i.e., of size defined by the upper bound or reached through the progressive sampling approach) enable to approximate the actual top-K frequent itemsets with accuracy much higher than what analytically proved.
引用
收藏
页码:310 / 326
页数:17
相关论文
共 50 条
  • [21] Efficient Algorithms for Mining Top-K High Utility Itemsets
    Tseng, Vincent S.
    Wu, Cheng-Wei
    Fournier-Viger, Philippe
    Yu, Philip S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) : 54 - 67
  • [22] Mining top-k frequent closed itemsets over data streams using the sliding window model
    Tsai, Pauray S. M.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (10) : 6968 - 6973
  • [23] Mining of top-k high utility itemsets with negative utility
    Sun, Rui
    Han, Meng
    Zhang, Chunyan
    Shen, Mingyao
    Du, Shiyu
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (03) : 5637 - 5652
  • [24] A Declarative Framework for Mining Top-k High Utility Itemsets
    Hidouri, Amel
    Jabbour, Said
    Raddaoui, Badran
    Chebbah, Mouna
    Ben Yaghlane, Boutheina
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2021), 2021, 12925 : 250 - 256
  • [25] Discovering Top-k Probabilistic Frequent Itemsets from Uncertain Databases
    Li, Haifeng
    Zhang, Yuejin
    Zhang, Ning
    [J]. 5TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2017, 2017, 122 : 1124 - 1132
  • [26] FS3: A Sampling based method for top-k Frequent Subgraph Mining
    Saha, Tanay Kumar
    Al Hasan, Mohammad
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [27] FS3: A Sampling Based Method for Top-k Frequent Subgraph Mining
    Saha, Tanay Kumar
    Al Hasan, Mohammad
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2015, 8 (04) : 245 - 261
  • [28] TKG: Efficient Mining of Top-K Frequent Subgraphs
    Fournier-Viger, Philippe
    Cheng, Chao
    Lin, Jerry Chun-Wei
    Yun, Unil
    Kiran, R. Uday
    [J]. BIG DATA ANALYTICS (BDA 2019), 2019, 11932 : 209 - 226
  • [29] Mining top-k frequent closed iternsets is not in APX
    Wu, Chienwen
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 435 - 439
  • [30] Mining Top-k Frequent-regular Itemsets from Data Streams Based on Sliding Window Technique
    Mesama, Tashinee
    Amphawan, Komate
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS (ICAICTA 2018), 2018, : 224 - 230