Observations on Factors Affecting Performance of MapReduce based Apriori on Hadoop Cluster

被引:0
|
作者
Singh, Sudhakar [1 ]
Garg, Rakhi [2 ]
Mishra, P. K. [1 ]
机构
[1] BHU, Inst Sci, Dept Comp Sci, Varanasi, Uttar Pradesh, India
[2] BHU, Mahila Mahavidyalaya, Dept Comp Sci, Varanasi, Uttar Pradesh, India
关键词
Frequent Itemset Mining; Apriori; Heterogeneous Hadoop Cluster; MapReduce; Big Data;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Designing fast and scalable algorithm for mining frequent itemsets is always being a most eminent and promising problem of data mining. Apriori is one of the most broadly used and popular algorithm of frequent itemset mining. Designing efficient algorithms on MapReduce framework to process and analyze big datasets is contemporary research nowadays. In this paper, we have focused on the performance of MapReduce based Apriori on homogeneous as well as on heterogeneous Hadoop cluster. We have investigated a number of factors that significantly affects the execution time of MapReduce based Apriori running on homogeneous and heterogeneous Hadoop Cluster. Factors are specific to both algorithmic and nonalgorithmic improvements. Considered factors specific to algorithmic improvements are filtered transactions and data structures. Experimental results show that how an appropriate data structure and filtered transactions technique drastically reduce the execution time. The non-algorithmic factors include speculative execution, nodes with poor performance, data locality & distribution of data blocks, and parallelism control with input split size. We have applied strategies against these factors and fine tuned the relevant parameters in our particular application. Experimental results show that if cluster specific parameters are taken care of then there is a significant reduction in execution time. Also we have discussed the issues regarding MapReduce implementation of Apriori which may significantly influence the performance.
引用
收藏
页码:87 / 94
页数:8
相关论文
共 50 条
  • [41] An Approach to Enhance the Performance of Hadoop MapReduce Framework for Big Data
    Chandra, Subhash
    Motwani, Deepak
    [J]. 2016 INTERNATIONAL CONFERENCE ON MICRO-ELECTRONICS AND TELECOMMUNICATION ENGINEERING (ICMETE), 2016, : 178 - 182
  • [42] A Parallel Genetic Algorithms Framework based on Hadoop MapReduce
    Ferrucci, Filomena
    Salza, Pasquale
    Kechadi, M-Tahar
    Sarro, Federica
    [J]. 30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 1664 - 1667
  • [43] IMapC: Inner MAPping Combiner to Enhance the Performance of MapReduce in Hadoop
    Kavitha, C.
    Srividhya, S. R.
    Lai, Wen-Cheng
    Mani, Vinodhini
    [J]. ELECTRONICS, 2022, 11 (10)
  • [44] Task failure resilience technique for improving the performance of MapReduce in Hadoop
    Kavitha, C.
    Anita, X.
    [J]. ETRI JOURNAL, 2020, 42 (05) : 751 - 763
  • [45] Performance Control for Nonlinear Hadoop-Mapreduce Computing Systems
    Lei, Jing
    Song, Jia-Qing
    [J]. INTEGRATED FERROELECTRICS, 2023, 233 (01) : 148 - 159
  • [46] Evaluation of Datacenter Network Topology Influence on Hadoop MapReduce Performance
    Kouba, Zdenek
    Tomanek, Ondrej
    Kencl, Lukas
    [J]. 2016 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (IEEE CLOUDNET), 2016, : 95 - 100
  • [47] MapReduce Based Analysis of Sample Applications Using Hadoop
    Ghazi, Mohd Rehan
    Raghava, N. S.
    [J]. APPLICATIONS OF COMPUTING AND COMMUNICATION TECHNOLOGIES, ICACCT 2018, 2018, 899 : 34 - 44
  • [48] Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach
    Kumar, Sandeep
    Padakandla, Sindhu
    Chandrashekar, L.
    Parihar, Priyank
    Gopinath, K.
    Bhatnagar, Shalabh
    [J]. 2017 IEEE 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2017, : 375 - 382
  • [49] An approach for MapReduce based Log analysis using Hadoop
    Hingave, Hemant
    Ingle, Rasika
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 1264 - 1268
  • [50] Self-configuration of the Number of Concurrently Running MapReduce Jobs in a Hadoop Cluster
    Zhang, Bo
    Krikava, Filip
    Rouvoy, Romain
    Seinturier, Lionel
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING, 2015, : 149 - 150