A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce

被引:0
|
作者
Fumarola, Fabio [1 ]
Malerba, Donato [1 ]
机构
[1] Univ Bari Aldo Moro, Dept Comp Sci, Via E Orabona 4, I-70125 Bari, Italy
关键词
Map-Reduce; Frequent Itemset Mining; Chernoff Bound;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, several algorithms based on the MapReduce framework have been proposed for frequent pattern mining in Big Data. However, the proposed solutions come with their own technical challenges, such as inter-communication costs, inprocess synchronizations, balanced data distribution and input parameters tuning, which negatively affect the computation time. In this paper we present MrAdam, a novel parallel, distributed algorithm which addresses these problems. The key principle underlying the design of MrAdam is that one can make reasonable decisions in the absence of perfect answers. Indeed, given the classical threshold for minimum support and a userspecified error bound, MrAdam exploits the Chernoff bound to mine "approximate" frequent itemsets with statistical error guarantees on their actual supports. These itemsets are generated in parallel and independently from subsets of the input dataset, by exploiting the MapReduce parallel computation framework. The result collections of frequent itemsets from each subset are aggregated and filtered by using a novel technique to provide a single collection in output. MrAdam can scale well on gigabytes of data and tens of machines, as experimentally proven on real datasets. In the experiments we also show that the proposed algorithm returns a good statistically bounded approximation of the exact results.
引用
收藏
页码:335 / 342
页数:8
相关论文
共 50 条
  • [41] Adaptive Apriori Algorithm for Frequent Itemset Mining
    Patill, Shubhangi D.
    Deshmukh, Ratnadeep R.
    Kirange, D. K.
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART-2016), 2016, : 7 - 13
  • [42] MapReduce-based Frequent Itemset Mining for Analysis of Electronic Evidence
    Jiang, Xueqing
    Sun, Guozi
    [J]. 2013 EIGHTH INTERNATIONAL WORKSHOP ON SYSTEMATIC APPROACHES TO DIGITAL FORENSIC ENGINEERING (SADFE), 2013,
  • [43] An efficient algorithm for fuzzy frequent itemset mining
    Wu, Tsu-Yang
    Lin, Jerry Chun-Wei
    Yun, Unil
    Chen, Chun-Hao
    Srivastava, Gautam
    Lv, Xianbiao
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (05) : 5787 - 5797
  • [44] An Incremental Algorithm for Frequent Itemset Mining on Spark
    Yu, Min
    Zuo, Chuang
    Yuan, Yunpeng
    Yang, Yulu
    [J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 281 - 285
  • [45] Parallel and distributed methods for incremental frequent itemset mining
    Otey, ME
    Parthasarathy, S
    Wang, C
    Veloso, A
    Meira, W
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (06): : 2439 - 2450
  • [46] Parallel and distributed frequent itemset mining on dynamic datasets
    Veloso, A
    Otey, ME
    Parthasarathy, S
    Meira, W
    [J]. HIGH PERFORMANCE COMPUTING - HIPC 2003, 2003, 2913 : 184 - 193
  • [47] Parallel Incremental Frequent Itemset Mining for Large Data
    Song, Yu-Geng
    Cui, Hui-Min
    Feng, Xiao-Bing
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (02) : 368 - 385
  • [48] Towards a Verified Parallel Implementation of Frequent Itemset Mining
    Whitney, Christopher D.
    Loulergue, Fre de Ric
    [J]. 2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 889 - 890
  • [49] Asynchronous and anticipatory filter-stream based parallel algorithm for frequent itemset mining
    Veloso, A
    Meira, W
    Ferreira, R
    Neto, DG
    Parthasarathy, S
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2004, PROCEEDINGS, 2004, 3202 : 422 - 433
  • [50] Parallel Incremental Frequent Itemset Mining for Large Data
    Yu-Geng Song
    Hui-Min Cui
    Xiao-Bing Feng
    [J]. Journal of Computer Science and Technology, 2017, 32 : 368 - 385