A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce

被引:0
|
作者
Fumarola, Fabio [1 ]
Malerba, Donato [1 ]
机构
[1] Univ Bari Aldo Moro, Dept Comp Sci, Via E Orabona 4, I-70125 Bari, Italy
关键词
Map-Reduce; Frequent Itemset Mining; Chernoff Bound;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, several algorithms based on the MapReduce framework have been proposed for frequent pattern mining in Big Data. However, the proposed solutions come with their own technical challenges, such as inter-communication costs, inprocess synchronizations, balanced data distribution and input parameters tuning, which negatively affect the computation time. In this paper we present MrAdam, a novel parallel, distributed algorithm which addresses these problems. The key principle underlying the design of MrAdam is that one can make reasonable decisions in the absence of perfect answers. Indeed, given the classical threshold for minimum support and a userspecified error bound, MrAdam exploits the Chernoff bound to mine "approximate" frequent itemsets with statistical error guarantees on their actual supports. These itemsets are generated in parallel and independently from subsets of the input dataset, by exploiting the MapReduce parallel computation framework. The result collections of frequent itemsets from each subset are aggregated and filtered by using a novel technique to provide a single collection in output. MrAdam can scale well on gigabytes of data and tens of machines, as experimentally proven on real datasets. In the experiments we also show that the proposed algorithm returns a good statistically bounded approximation of the exact results.
引用
收藏
页码:335 / 342
页数:8
相关论文
共 50 条
  • [31] Parallel Frequent Itemset Mining on Streaming Data
    He, Yanshan
    Yue, Min
    [J]. 2014 10TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2014, : 725 - 730
  • [32] Approximate Frequent Itemset Mining for Streaming Data on FPGA
    Li, Yubin
    Sun, Yuliang
    Dai, Guohao
    Xu, Qiang
    Wang, Yu
    Yang, Huazhong
    [J]. 2016 26TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2016,
  • [33] Implementation of an Improved Algorithm for Frequent Itemset Mining using Hadoop
    Agarwal, Ruchi
    Singh, Sunny
    Vats, Satvik
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 13 - 18
  • [34] Mining Frequent Itemset Using Quine-McCluskey Algorithm
    Bajpayee, Kanishka
    Kant, Surya
    Pant, Bhaskar
    Chaudhary, Ankur
    Sharma, Shashi Kumar
    [J]. PROCEEDINGS OF FIFTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2015), VOL 2, 2016, 437 : 763 - 769
  • [35] Approximate Parallel High Utility Itemset Mining
    Chen, Yan
    An, Aijun
    [J]. BIG DATA RESEARCH, 2016, 6 : 26 - 42
  • [36] Efficient Incremental Itemset Tree for Approximate Frequent Itemset Mining On Data Stream
    Bai, Pavitra S.
    Kumar, Ravi G. K.
    [J]. PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2016, : 239 - 242
  • [37] An Improved Version of the Frequent Itemset Mining Algorithm
    Butincu, Cristian Nicolae
    Craus, Mitica
    [J]. 2015 14TH ROEDUNET INTERNATIONAL CONFERENCE - NETWORKING IN EDUCATION AND RESEARCH (ROEDUNET NER), 2015, : 184 - 189
  • [38] HBPFP-DC: A parallel frequent itemset mining using Spark
    Xun, Yaling
    Zhang, Jifu
    Yang, Haifeng
    Qin, Xiao
    [J]. PARALLEL COMPUTING, 2021, 101
  • [39] The Choice of Optimal Algorithm for Frequent Itemset Mining
    Busarov, Vyacheslav
    Grafeeva, Natalia
    Mikhailova, Elena
    [J]. DATABASES AND INFORMATION SYSTEMS IX, 2016, 291 : 211 - 224
  • [40] Revised ECLAT Algorithm for Frequent Itemset Mining
    Suvalka, Bharati
    Khandelwal, Sarika
    Patel, Chintal
    [J]. INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 2, INDIA 2016, 2016, 434 : 219 - 226