Observations on Factors Affecting Performance of MapReduce based Apriori on Hadoop Cluster

被引:0
|
作者
Singh, Sudhakar [1 ]
Garg, Rakhi [2 ]
Mishra, P. K. [1 ]
机构
[1] BHU, Inst Sci, Dept Comp Sci, Varanasi, Uttar Pradesh, India
[2] BHU, Mahila Mahavidyalaya, Dept Comp Sci, Varanasi, Uttar Pradesh, India
关键词
Frequent Itemset Mining; Apriori; Heterogeneous Hadoop Cluster; MapReduce; Big Data;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Designing fast and scalable algorithm for mining frequent itemsets is always being a most eminent and promising problem of data mining. Apriori is one of the most broadly used and popular algorithm of frequent itemset mining. Designing efficient algorithms on MapReduce framework to process and analyze big datasets is contemporary research nowadays. In this paper, we have focused on the performance of MapReduce based Apriori on homogeneous as well as on heterogeneous Hadoop cluster. We have investigated a number of factors that significantly affects the execution time of MapReduce based Apriori running on homogeneous and heterogeneous Hadoop Cluster. Factors are specific to both algorithmic and nonalgorithmic improvements. Considered factors specific to algorithmic improvements are filtered transactions and data structures. Experimental results show that how an appropriate data structure and filtered transactions technique drastically reduce the execution time. The non-algorithmic factors include speculative execution, nodes with poor performance, data locality & distribution of data blocks, and parallelism control with input split size. We have applied strategies against these factors and fine tuned the relevant parameters in our particular application. Experimental results show that if cluster specific parameters are taken care of then there is a significant reduction in execution time. Also we have discussed the issues regarding MapReduce implementation of Apriori which may significantly influence the performance.
引用
收藏
页码:87 / 94
页数:8
相关论文
共 50 条
  • [1] Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster
    Singh, Sudhakar
    Garg, Rakhi
    Mishra, P. K.
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2018, 67 : 348 - 364
  • [2] Performance analysis of MapReduce Programs on Hadoop cluster
    Maurya, Mahesh
    Mahajan, Sunita
    [J]. PROCEEDINGS OF THE 2012 WORLD CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2012, : 505 - 510
  • [3] Performance Analysis of MapReduce on OpenStack-based Hadoop Virtual Cluster
    Ahmad, Nazrul M.
    Yaacob, Asrul Hadi
    Amin, Anang Hudaya Muhamad
    Kannan, Subarmaniam
    [J]. 2014 IEEE 2ND INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATION TECHNOLOGIES (ISTT), 2014, : 132 - 137
  • [4] Network Intrusion Detection with a Hashing Based Apriori Algorithm Using Hadoop MapReduce
    Azeez, Nureni Ayofe
    Ayemobola, Tolulope Jide
    Misra, Sanjay
    Maskeliunas, Rytis
    Damasevicius, Robertas
    [J]. COMPUTERS, 2019, 8 (04)
  • [5] A MapReduce Optimization Method on Hadoop Cluster
    Wu, Xiaodong
    [J]. 2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2015, : 18 - 21
  • [6] A Performance Comparison of Apache Tez and MapReduce with Data Compression on Hadoop Cluster
    Rattanaopas, Kritwara
    [J]. PROCEEDINGS OF 2017 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2017,
  • [7] Review and comparison of Apriori algorithm implementations on Hadoop-MapReduce and Spark
    Castro, Eduardo P. S.
    Maia, Thiago D.
    Pereira, Marluce R.
    Esmin, Ahmed A. A.
    Pereira, Denilson A.
    [J]. KNOWLEDGE ENGINEERING REVIEW, 2018, 33 : 1 - 25
  • [8] Analyzing performance of Apache Tez and MapReduce with hadoop multinode cluster on Amazon cloud
    Singh R.
    Kaur P.J.
    [J]. Journal of Big Data, 3 (1)
  • [9] Performance Improvement of MapReduce Framework by Identifying Slow TaskTrackers in Heterogeneous Hadoop Cluster
    Naik, Nenavath Srinivas
    Negi, Atul
    Sastry, V. N.
    [J]. PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS, ICACNI 2015, VOL 2, 2016, 44 : 465 - 473
  • [10] A Hadoop MapReduce Performance Prediction Method
    Song, Ge
    Meng, Zide
    Huet, Fabrice
    Magoules, Frederic
    Yu, Lei
    Lin, Xuelian
    [J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 820 - 825