Dynamic Job Ordering and Slot Configurations for MapReduce Workloads

被引:21
|
作者
Tang, Shanjiang [1 ]
Lee, Bu-Sung [2 ]
He, Bingsheng [2 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
基金
新加坡国家研究基金会;
关键词
MapReduce; Hadoop; flow-shops; scheduling algorithm; job ordering; 2-STAGE;
D O I
10.1109/TSC.2015.2426186
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and data centers. A MapReduce workload generally contains a set of jobs, each of which consists of multiple map tasks followed by multiple reduce tasks. Due to 1) that map tasks can only run in map slots and reduce tasks can only run in reduce slots, and 2) the general execution constraints that map tasks are executed before reduce tasks, different job execution orders and map/reduce slot configurations for a MapReduce workload have significantly different performance and system utilization. This paper proposes two classes of algorithms to minimize the makespan and the total completion time for an offline MapReduce workload. Our first class of algorithms focuses on the job ordering optimization for a MapReduce workload under a given map/reduce slot configuration. In contrast, our second class of algorithms considers the scenario that we can perform optimization for map/reduce slot configuration for a MapReduce workload. We perform simulations as well as experiments on Amazon EC2 and show that our proposed algorithms produce results that are up to 15 similar to 80 percent better than currently unoptimized Hadoop, leading to significant reductions in running time in practice.
引用
收藏
页码:4 / 17
页数:14
相关论文
共 50 条
  • [41] rTuner: A Performance Enhancement of MapReduce Job
    Patgiri, Ripon
    Das, Rajdeep
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON COMPUTER MODELING AND SIMULATION (ICCMS 2018), 2017, : 176 - 183
  • [42] Estimating runtime of a job in Hadoop MapReduce
    Narges Peyravi
    Ali Moeini
    Journal of Big Data, 7
  • [43] A REVIEW ON JOB SCHEDULING FOR HADOOP MAPREDUCE
    Kalia, Khushboo
    Gupta, Neeraj
    2017 INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING AND INFORMATION SYSTEMS (ICNGCIS), 2017, : 75 - 79
  • [44] HMM Optimized Modeling of SSD Storage for I/O MapReduce Workloads
    Alsayoud, Fatimah
    Miri, Ali
    2019 IEEE 10TH ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (IEMCON), 2019, : 177 - 183
  • [45] Big Data Processing with harnessing Hadoop - MapReduce for Optimizing Analytical Workloads
    Satish, Rama K., V
    Kavya, N. P.
    2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 49 - 54
  • [46] Cost-Minimizing Preemptive Scheduling of MapReduce Workloads on Hybrid Clouds
    Qiu, Xuanjia
    Yeow, Wai Leong
    Wu, Chuan
    Lau, Francis C. M.
    2013 IEEE/ACM 21ST INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2013, : 213 - 218
  • [47] Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds
    Rao, B. Thirumala
    Reddy, L. S. S.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (06): : 105 - 112
  • [48] Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment
    Rathinaraja, J.
    Ananthanarayana, V. S.
    Paul, Anand
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (11): : 7520 - 7549
  • [49] Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment
    J. Rathinaraja
    V. S. Ananthanarayana
    Anand Paul
    The Journal of Supercomputing, 2019, 75 : 7520 - 7549
  • [50] Heterogeneous Job Allocation Scheduler for Hadoop MapReduce Using Dynamic Grouping Integrated Neighboring Search
    Chen, Chi-Ting
    Hung, Ling-Ju
    Hsieh, Sun-Yuan
    Buyya, Rajkumar
    Zomaya, Albert Y.
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (01) : 193 - 206