Dynamic Job Ordering and Slot Configurations for MapReduce Workloads

被引:21
|
作者
Tang, Shanjiang [1 ]
Lee, Bu-Sung [2 ]
He, Bingsheng [2 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
基金
新加坡国家研究基金会;
关键词
MapReduce; Hadoop; flow-shops; scheduling algorithm; job ordering; 2-STAGE;
D O I
10.1109/TSC.2015.2426186
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and data centers. A MapReduce workload generally contains a set of jobs, each of which consists of multiple map tasks followed by multiple reduce tasks. Due to 1) that map tasks can only run in map slots and reduce tasks can only run in reduce slots, and 2) the general execution constraints that map tasks are executed before reduce tasks, different job execution orders and map/reduce slot configurations for a MapReduce workload have significantly different performance and system utilization. This paper proposes two classes of algorithms to minimize the makespan and the total completion time for an offline MapReduce workload. Our first class of algorithms focuses on the job ordering optimization for a MapReduce workload under a given map/reduce slot configuration. In contrast, our second class of algorithms considers the scenario that we can perform optimization for map/reduce slot configuration for a MapReduce workload. We perform simulations as well as experiments on Amazon EC2 and show that our proposed algorithms produce results that are up to 15 similar to 80 percent better than currently unoptimized Hadoop, leading to significant reductions in running time in practice.
引用
收藏
页码:4 / 17
页数:14
相关论文
共 50 条
  • [31] A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment
    Phuong Nguyen
    Simon, Tyler
    Halem, Milton
    Chapman, David
    Le, Quang
    2012 IEEE/ACM FIFTH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC 2012), 2012, : 161 - 167
  • [32] MREv: An automatic mapreduce evaluation tool for big data workloads
    20153401191864
    (1) Computer Architecture Group, University of A Coruña, Spain, (Elsevier B.V., Netherlands):
  • [33] HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
    Abouzeid, Azza
    Bajda-Pawlikowski, Kamil
    Abadi, Daniel
    Silberschatz, Avi
    Rasin, Alexander
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (01):
  • [34] MapReduce Job Scheduling Based on Remaining Job Sizes
    Matsuki, Tatsuma
    Takine, Tetsuya
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2015, E98B (01) : 180 - 189
  • [35] Dynamic Slot-based Task Scheduling Based on Node Workload in a MapReduce Computation Model
    Shih, Hsin-Yu
    Huang, Jhih-Jia
    Leu, Jenq-Shiou
    2012 INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY AND IDENTIFICATION (ASID), 2012,
  • [36] A review on job scheduling for hadoop mapreduce
    Kalia, Khushboo
    Gupta, Neeraj
    Proceedings - 2017 International Conference on Next Generation Computing and Information Systems, ICNGCIS 2017, 2018, : 86 - 91
  • [37] MapReduce Job Optimization: A Mapping Study
    Lu, Qinghua
    Zhu, Liming
    Zhang, He
    Wu, Dongyao
    Li, Zheng
    Xu, Xiwei
    2015 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2015, : 81 - 87
  • [38] Estimating runtime of a job in Hadoop MapReduce
    Peyravi, Narges
    Moeini, Ali
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [39] A COMPARATIVE REVIEW OF JOB SCHEDULING FOR MAPREDUCE
    Yoo, Dongjin
    Sim, Kwang Mong
    2011 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS, 2011, : 353 - 358
  • [40] PerfXplain: Debugging MapReduce Job Performance
    Khoussainova, Nodira
    Balazinska, Magdalena
    Suciu, Dan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (07): : 598 - 609