Mining Frequent Itemsets through Progressive Sampling with Rademacher Averages

被引:31
|
作者
Riondato, Matteo [1 ]
Upfal, Eli [1 ]
机构
[1] Brown Univ, Dept Comp Sci, Providence, RI 02912 USA
关键词
Frequent Itemsets; Pattern Mining; Rademacher Averages; Sampling; Statistical Learning Theory;
D O I
10.1145/2783258.2783265
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an algorithm to extract an high-quality approximation of the (top-k) Frequent itemsets (FIs) from random samples of a transactional dataset. With high probability the approximation is a superset of the FIs, and no itemset with frequency much lower than the threshold is included in it. The algorithm employs progressive sampling, with a stopping condition based on bounds to the empirical Rademacher average, a key concept from statistical learning theory. The computation of the bounds uses characteristic quantities that can be obtained efficiently with a single scan of the sample. Therefore, evaluating the stopping condition is fast, and does not require an expensive mining of each sample. Our experimental evaluation confirms the practicality of our approach on real datasets, outperforming approaches based on one-shot static sampling.
引用
收藏
页码:1005 / 1014
页数:10
相关论文
共 50 条
  • [1] Mining top-K frequent itemsets through progressive sampling
    Pietracaprina, Andrea
    Riondato, Matteo
    Upfal, Eli
    Vandin, Fabio
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 21 (02) : 310 - 326
  • [2] Mining top-K frequent itemsets through progressive sampling
    Andrea Pietracaprina
    Matteo Riondato
    Eli Upfal
    Fabio Vandin
    [J]. Data Mining and Knowledge Discovery, 2010, 21 : 310 - 326
  • [3] Efficient frequent itemsets mining through sampling and information granulation
    Zhang, Zhongjie
    Pedrycz, Witold
    Huang, Jian
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2017, 65 : 119 - 136
  • [4] Efficient frequent itemsets mining by sampling
    Zhao, Yanchang
    Zhang, Chengqi
    Zhang, Shichao
    [J]. ADVANCES IN INTELLIGENT IT: ACTIVE MEDIA TECHNOLOGY 2006, 2006, 138 : 112 - +
  • [5] The research of sampling for mining frequent itemsets
    Hu, Xuegang
    Yu, Haitao
    [J]. ROUGH SETS AND KNOWLEDGE TECHNOLOGY, PROCEEDINGS, 2006, 4062 : 496 - 501
  • [6] Mining Maximal Frequent Itemsets over Sampling Databases
    Li, Haifeng
    [J]. PROCEEDINGS OF THE 2015 2ND INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION (IFEEA 2015), 2016, 54 : 28 - 31
  • [7] Progressive rademacher sampling
    Elomaa, T
    Kääriäinen, M
    [J]. EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 140 - 145
  • [8] Mining frequent itemsets in a stream
    Calders, Toon
    Dexters, Nele
    Goethals, Bart
    [J]. ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 83 - +
  • [9] An Algorithm for Mining Frequent Itemsets
    Hernandez Leon, Raudel
    Perez Suarez, Airel
    Feregrino Uribe, Claudia
    Guzman Zavaleta, Zobeida Jezabel
    [J]. 2008 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, COMPUTING SCIENCE AND AUTOMATIC CONTROL (CCE 2008), 2008, : 236 - +
  • [10] Mining frequent itemsets in a stream
    Calders, Toon
    Dexters, Nele
    Gillis, Joris J. M.
    Goethals, Bart
    [J]. INFORMATION SYSTEMS, 2014, 39 : 233 - 255