A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce

被引:0
|
作者
Fumarola, Fabio [1 ]
Malerba, Donato [1 ]
机构
[1] Univ Bari Aldo Moro, Dept Comp Sci, Via E Orabona 4, I-70125 Bari, Italy
关键词
Map-Reduce; Frequent Itemset Mining; Chernoff Bound;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, several algorithms based on the MapReduce framework have been proposed for frequent pattern mining in Big Data. However, the proposed solutions come with their own technical challenges, such as inter-communication costs, inprocess synchronizations, balanced data distribution and input parameters tuning, which negatively affect the computation time. In this paper we present MrAdam, a novel parallel, distributed algorithm which addresses these problems. The key principle underlying the design of MrAdam is that one can make reasonable decisions in the absence of perfect answers. Indeed, given the classical threshold for minimum support and a userspecified error bound, MrAdam exploits the Chernoff bound to mine "approximate" frequent itemsets with statistical error guarantees on their actual supports. These itemsets are generated in parallel and independently from subsets of the input dataset, by exploiting the MapReduce parallel computation framework. The result collections of frequent itemsets from each subset are aggregated and filtered by using a novel technique to provide a single collection in output. MrAdam can scale well on gigabytes of data and tens of machines, as experimentally proven on real datasets. In the experiments we also show that the proposed algorithm returns a good statistically bounded approximation of the exact results.
引用
收藏
页码:335 / 342
页数:8
相关论文
共 50 条
  • [1] PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining
    Mao, Yimin
    Geng, Junhao
    Mwakapesa, Deborah Simon
    Nanehkaran, Yaser Ahangari
    Chi, Zhang
    Deng, Xiaoheng
    Chen, Zhigang
    [J]. MULTIMEDIA SYSTEMS, 2021, 27 (04) : 709 - 722
  • [2] PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining
    Mao Yimin
    Geng Junhao
    Deborah Simon Mwakapesa
    Yaser Ahangari Nanehkaran
    Zhang Chi
    Deng Xiaoheng
    Chen Zhigang
    [J]. Multimedia Systems, 2021, 27 : 709 - 722
  • [3] Frequent Itemset Mining using Improved Apriori Algorithm with MapReduce
    Tribhuvan, Seema A.
    Gavai, Nitin R.
    Vasgi, Bharti P.
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2017,
  • [4] A parallel algorithm for frequent itemset mining
    Li, L
    Zhai, DH
    Fan, J
    [J]. PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT'2003, PROCEEDINGS, 2003, : 868 - 871
  • [5] A Generalized Parallel Algorithm for Frequent Itemset Mining
    Craus, Mitica
    Archip, Alexandru
    [J]. PROCEEDINGS OF THE 12TH WSEAS INTERNATIONAL CONFERENCE ON COMPUTERS , PTS 1-3: NEW ASPECTS OF COMPUTERS, 2008, : 520 - +
  • [6] A Highly Parallel Algorithm for Frequent Itemset Mining
    Mesa, Alejandro
    Feregrino-Uribe, Claudia
    Cumplido, Rene
    Hernandez-Palancar, Jose
    [J]. ADVANCES IN PATTERN RECOGNITION, 2010, 6256 : 291 - +
  • [7] MapReduce Based Frequent Itemset Mining Algorithm on Stream Data
    Chaudhary, Hemant
    Yadav, Deepak Kumar
    Bhatnagar, Rajat
    Chandrasekhar, Uddagiri
    [J]. 2015 GLOBAL CONFERENCE ON COMMUNICATION TECHNOLOGIES (GCCT), 2015, : 586 - 591
  • [8] A parallel algorithm for mining constrained frequent patterns using MapReduce
    Yan, Xiaowu
    Zhang, Jifu
    Xun, Yaling
    Qin, Xiao
    [J]. SOFT COMPUTING, 2017, 21 (09) : 2237 - 2249
  • [9] A parallel algorithm for mining constrained frequent patterns using MapReduce
    Xiaowu Yan
    Jifu Zhang
    Yaling Xun
    Xiao Qin
    [J]. Soft Computing, 2017, 21 : 2237 - 2249
  • [10] YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark
    Qiu, Hongjian
    Gu, Rong
    Yuan, Chunfeng
    Huang, Yihua
    [J]. PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 1664 - 1671