A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce

被引：0

作者：

Fumarola, Fabio ^{[1
]}

Malerba, Donato ^{[1
]}

机构：

[1] Univ Bari Aldo Moro, Dept Comp Sci, Via E Orabona 4, I-70125 Bari, Italy

来源：

2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS) | 2014年

关键词：

Map-Reduce; Frequent Itemset Mining; Chernoff Bound;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, several algorithms based on the MapReduce framework have been proposed for frequent pattern mining in Big Data. However, the proposed solutions come with their own technical challenges, such as inter-communication costs, inprocess synchronizations, balanced data distribution and input parameters tuning, which negatively affect the computation time. In this paper we present MrAdam, a novel parallel, distributed algorithm which addresses these problems. The key principle underlying the design of MrAdam is that one can make reasonable decisions in the absence of perfect answers. Indeed, given the classical threshold for minimum support and a userspecified error bound, MrAdam exploits the Chernoff bound to mine "approximate" frequent itemsets with statistical error guarantees on their actual supports. These itemsets are generated in parallel and independently from subsets of the input dataset, by exploiting the MapReduce parallel computation framework. The result collections of frequent itemsets from each subset are aggregated and filtered by using a novel technique to provide a single collection in output. MrAdam can scale well on gigabytes of data and tens of machines, as experimentally proven on real datasets. In the experiments we also show that the proposed algorithm returns a good statistically bounded approximation of the exact results.

引用

页码：335 / 342

页数：8

共 50 条

[31] Parallel Frequent Itemset Mining on Streaming Data
He, Yanshan
Yue, Min
[J]. 2014 10TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2014, : 725 - 730
[32] Approximate Frequent Itemset Mining for Streaming Data on FPGA
Li, Yubin
Sun, Yuliang
Dai, Guohao
Xu, Qiang
Wang, Yu
Yang, Huazhong
[J]. 2016 26TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2016,
[33] Implementation of an Improved Algorithm for Frequent Itemset Mining using Hadoop
Agarwal, Ruchi
Singh, Sunny
Vats, Satvik
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 13 - 18
[34] Mining Frequent Itemset Using Quine-McCluskey Algorithm
Bajpayee, Kanishka
Kant, Surya
Pant, Bhaskar
Chaudhary, Ankur
Sharma, Shashi Kumar
[J]. PROCEEDINGS OF FIFTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2015), VOL 2, 2016, 437 : 763 - 769
[35] Approximate Parallel High Utility Itemset Mining
Chen, Yan
An, Aijun
[J]. BIG DATA RESEARCH, 2016, 6 : 26 - 42
[36] Efficient Incremental Itemset Tree for Approximate Frequent Itemset Mining On Data Stream
Bai, Pavitra S.
Kumar, Ravi G. K.
[J]. PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2016, : 239 - 242
[37] An Improved Version of the Frequent Itemset Mining Algorithm
Butincu, Cristian Nicolae
Craus, Mitica
[J]. 2015 14TH ROEDUNET INTERNATIONAL CONFERENCE - NETWORKING IN EDUCATION AND RESEARCH (ROEDUNET NER), 2015, : 184 - 189
[38] HBPFP-DC: A parallel frequent itemset mining using Spark
Xun, Yaling
Zhang, Jifu
Yang, Haifeng
Qin, Xiao
[J]. PARALLEL COMPUTING, 2021, 101
[39] The Choice of Optimal Algorithm for Frequent Itemset Mining
Busarov, Vyacheslav
Grafeeva, Natalia
Mikhailova, Elena
[J]. DATABASES AND INFORMATION SYSTEMS IX, 2016, 291 : 211 - 224
[40] Revised ECLAT Algorithm for Frequent Itemset Mining
Suvalka, Bharati
Khandelwal, Sarika
Patel, Chintal
[J]. INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 2, INDIA 2016, 2016, 434 : 219 - 226

← 1 2 3 4 5 →