Performance Optimization for Short MapReduce Job Execution in Hadoop

被引:12
|
作者
Yan, Jinshuang [1 ]
Yang, Xiaoliang [1 ]
Gu, Rong [1 ]
Yuan, Chunfeng [1 ]
Huang, Yihua [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Dept Comp Sci & Technol, Nanjing 210093, Jiangsu, Peoples R China
关键词
MapReduce; parallel computing; job execution; performance optimization;
D O I
10.1109/CGC.2012.40
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop MapReduce is a widely used parallel computing framework for solving data-intensive problems. To be able to process large-scale datasets, the fundamental design of the standard Hadoop places more emphasis on high-throughput of data than on job execution performance. This causes performance limitation when we use Hadoop MapReduce to execute short jobs that requires quick responses. In order to speed up the execution of short jobs, this paper proposes optimization methods to improve the execution performance of MapReduce jobs. We made three major optimizations: first, we reduce the time cost during the initialization and termination stages of a job by optimizing its setup and cleanup tasks; second, we replace the pull-model task assignment mechanism with a push-model; third, we replace the heartbeat-based communication mechanism with an instant message communication mechanism for event notifications between the JobTracker and TaskTrackers. Experimental results show that the job execution performance of our improved version of Hadoop is about 23% faster on average than the standard Hadoop for our test application.
引用
收藏
页码:688 / 694
页数:7
相关论文
共 50 条
  • [1] Performance optimization for short job execution in Hadoop MapReduce
    Gu, Rong
    Yan, Jinshuang
    Yang, Xiaoliang
    Yuan, Chunfeng
    Huang, Yihua
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2014, 51 (06): : 1270 - 1280
  • [2] SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters
    Gu, Rong
    Yang, Xiaoliang
    Yan, Jinshuang
    Sun, Yuanhao
    Wang, Bing
    Yuan, Chunfeng
    Huang, Yihua
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (03) : 2166 - 2179
  • [3] A review on job scheduling for hadoop mapreduce
    Kalia, Khushboo
    Gupta, Neeraj
    [J]. Proceedings - 2017 International Conference on Next Generation Computing and Information Systems, ICNGCIS 2017, 2018, : 86 - 91
  • [4] Estimating runtime of a job in Hadoop MapReduce
    Narges Peyravi
    Ali Moeini
    [J]. Journal of Big Data, 7
  • [5] Estimating runtime of a job in Hadoop MapReduce
    Peyravi, Narges
    Moeini, Ali
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)
  • [6] A REVIEW ON JOB SCHEDULING FOR HADOOP MAPREDUCE
    Kalia, Khushboo
    Gupta, Neeraj
    [J]. 2017 INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING AND INFORMATION SYSTEMS (ICNGCIS), 2017, : 75 - 79
  • [7] Improving Hadoop MapReduce Performance with Data Compression: A Study using Wordcount Job
    Rattanaopas, Kritwara
    Kaewkeeree, Sureerat
    [J]. 2017 14TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2017, : 564 - 567
  • [8] A MapReduce Optimization Method on Hadoop Cluster
    Wu, Xiaodong
    [J]. 2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2015, : 18 - 21
  • [9] Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster
    Singh, Sudhakar
    Garg, Rakhi
    Mishra, P. K.
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2018, 67 : 348 - 364
  • [10] Hadoop-MapReduce Job Scheduling Algorithms Survey
    Mohamed, Ehab
    Hong, Zheng
    [J]. 2016 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2016, : 237 - 242