Observations on Factors Affecting Performance of MapReduce based Apriori on Hadoop Cluster

被引：0

作者：

Singh, Sudhakar ^{[1
]}

Garg, Rakhi ^{[2
]}

Mishra, P. K. ^{[1
]}

机构：

[1] BHU, Inst Sci, Dept Comp Sci, Varanasi, Uttar Pradesh, India

[2] BHU, Mahila Mahavidyalaya, Dept Comp Sci, Varanasi, Uttar Pradesh, India

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA) | 2016年

关键词：

Frequent Itemset Mining; Apriori; Heterogeneous Hadoop Cluster; MapReduce; Big Data;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Designing fast and scalable algorithm for mining frequent itemsets is always being a most eminent and promising problem of data mining. Apriori is one of the most broadly used and popular algorithm of frequent itemset mining. Designing efficient algorithms on MapReduce framework to process and analyze big datasets is contemporary research nowadays. In this paper, we have focused on the performance of MapReduce based Apriori on homogeneous as well as on heterogeneous Hadoop cluster. We have investigated a number of factors that significantly affects the execution time of MapReduce based Apriori running on homogeneous and heterogeneous Hadoop Cluster. Factors are specific to both algorithmic and nonalgorithmic improvements. Considered factors specific to algorithmic improvements are filtered transactions and data structures. Experimental results show that how an appropriate data structure and filtered transactions technique drastically reduce the execution time. The non-algorithmic factors include speculative execution, nodes with poor performance, data locality & distribution of data blocks, and parallelism control with input split size. We have applied strategies against these factors and fine tuned the relevant parameters in our particular application. Experimental results show that if cluster specific parameters are taken care of then there is a significant reduction in execution time. Also we have discussed the issues regarding MapReduce implementation of Apriori which may significantly influence the performance.

引用

页码：87 / 94

页数：8

共 50 条

[41] An Approach to Enhance the Performance of Hadoop MapReduce Framework for Big Data
Chandra, Subhash
Motwani, Deepak
[J]. 2016 INTERNATIONAL CONFERENCE ON MICRO-ELECTRONICS AND TELECOMMUNICATION ENGINEERING (ICMETE), 2016, : 178 - 182
[42] A Parallel Genetic Algorithms Framework based on Hadoop MapReduce
Ferrucci, Filomena
Salza, Pasquale
Kechadi, M-Tahar
Sarro, Federica
[J]. 30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 1664 - 1667
[43] IMapC: Inner MAPping Combiner to Enhance the Performance of MapReduce in Hadoop
Kavitha, C.
Srividhya, S. R.
Lai, Wen-Cheng
Mani, Vinodhini
[J]. ELECTRONICS, 2022, 11 (10)
[44] Task failure resilience technique for improving the performance of MapReduce in Hadoop
Kavitha, C.
Anita, X.
[J]. ETRI JOURNAL, 2020, 42 (05) : 751 - 763
[45] Performance Control for Nonlinear Hadoop-Mapreduce Computing Systems
Lei, Jing
Song, Jia-Qing
[J]. INTEGRATED FERROELECTRICS, 2023, 233 (01) : 148 - 159
[46] Evaluation of Datacenter Network Topology Influence on Hadoop MapReduce Performance
Kouba, Zdenek
Tomanek, Ondrej
Kencl, Lukas
[J]. 2016 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (IEEE CLOUDNET), 2016, : 95 - 100
[47] MapReduce Based Analysis of Sample Applications Using Hadoop
Ghazi, Mohd Rehan
Raghava, N. S.
[J]. APPLICATIONS OF COMPUTING AND COMMUNICATION TECHNOLOGIES, ICACCT 2018, 2018, 899 : 34 - 44
[48] Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach
Kumar, Sandeep
Padakandla, Sindhu
Chandrashekar, L.
Parihar, Priyank
Gopinath, K.
Bhatnagar, Shalabh
[J]. 2017 IEEE 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2017, : 375 - 382
[49] An approach for MapReduce based Log analysis using Hadoop
Hingave, Hemant
Ingle, Rasika
[J]. 2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 1264 - 1268
[50] Self-configuration of the Number of Concurrently Running MapReduce Jobs in a Hadoop Cluster
Zhang, Bo
Krikava, Filip
Rouvoy, Romain
Seinturier, Lionel
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING, 2015, : 149 - 150

← 1 2 3 4 5 →