Performance Optimization for Short MapReduce Job Execution in Hadoop

被引:12
|
作者
Yan, Jinshuang [1 ]
Yang, Xiaoliang [1 ]
Gu, Rong [1 ]
Yuan, Chunfeng [1 ]
Huang, Yihua [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Dept Comp Sci & Technol, Nanjing 210093, Jiangsu, Peoples R China
关键词
MapReduce; parallel computing; job execution; performance optimization;
D O I
10.1109/CGC.2012.40
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop MapReduce is a widely used parallel computing framework for solving data-intensive problems. To be able to process large-scale datasets, the fundamental design of the standard Hadoop places more emphasis on high-throughput of data than on job execution performance. This causes performance limitation when we use Hadoop MapReduce to execute short jobs that requires quick responses. In order to speed up the execution of short jobs, this paper proposes optimization methods to improve the execution performance of MapReduce jobs. We made three major optimizations: first, we reduce the time cost during the initialization and termination stages of a job by optimizing its setup and cleanup tasks; second, we replace the pull-model task assignment mechanism with a push-model; third, we replace the heartbeat-based communication mechanism with an instant message communication mechanism for event notifications between the JobTracker and TaskTrackers. Experimental results show that the job execution performance of our improved version of Hadoop is about 23% faster on average than the standard Hadoop for our test application.
引用
收藏
页码:688 / 694
页数:7
相关论文
共 50 条
  • [41] Memory and Performance Aware Scheduling Design for Hadoop MapReduce Framework
    Bakka, Jagadevi
    Lingareddy, Sanjeev C.
    [J]. BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (13): : 242 - 246
  • [42] Task failure resilience technique for improving the performance of MapReduce in Hadoop
    Kavitha, C.
    Anita, X.
    [J]. ETRI JOURNAL, 2020, 42 (05) : 751 - 763
  • [43] Performance Control for Nonlinear Hadoop-Mapreduce Computing Systems
    Lei, Jing
    Song, Jia-Qing
    [J]. INTEGRATED FERROELECTRICS, 2023, 233 (01) : 148 - 159
  • [44] Evaluation of Datacenter Network Topology Influence on Hadoop MapReduce Performance
    Kouba, Zdenek
    Tomanek, Ondrej
    Kencl, Lukas
    [J]. 2016 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (IEEE CLOUDNET), 2016, : 95 - 100
  • [45] IMapC: Inner MAPping Combiner to Enhance the Performance of MapReduce in Hadoop
    Kavitha, C.
    Srividhya, S. R.
    Lai, Wen-Cheng
    Mani, Vinodhini
    [J]. ELECTRONICS, 2022, 11 (10)
  • [46] Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach
    Kumar, Sandeep
    Padakandla, Sindhu
    Chandrashekar, L.
    Parihar, Priyank
    Gopinath, K.
    Bhatnagar, Shalabh
    [J]. 2017 IEEE 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2017, : 375 - 382
  • [47] Impact of MapReduce Task Re-execution Policy on Job Completion Reliability and Job Completion Time
    Lin, Jia-Chun
    Leu, Fang-Yie
    Chen, Ying-ping
    Munawar, Waqaas
    [J]. 2014 IEEE 28TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2014, : 712 - 718
  • [48] Performance Analysis of the Effect of a Combiner on a MapReduce Job
    Mhlanga, Imran Artwel J.
    Ahmad, Nazrul M.
    Azman, Afizan
    Razak, Siti Fatimah Abdul
    [J]. 2018 IEEE STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT (SCORED), 2018,
  • [49] Performance evaluation of job schedulers on Hadoop YARN
    Lin, Jia-Chun
    Lee, Ming-Chang
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (09): : 2711 - 2728
  • [50] Hadoop MapReduce for Mobile Clouds
    George, Johnu
    Chen, Chien-An
    Stoleru, Radu
    Xie, Geoffrey G.
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2019, 7 (01) : 224 - 236