A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment

被引:16
|
作者
Phuong Nguyen [1 ]
Simon, Tyler [1 ]
Halem, Milton [1 ]
Chapman, David [1 ]
Le, Quang [2 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Comp Sci & Elect Engn, Baltimore, MD 21228 USA
[2] Gen Dynam Informat Technol, Fairfax, VA 22030 USA
关键词
Hadoop; Scheduler; dynamic priority; scheduling; MapReduce; workflow;
D O I
10.1109/UCC.2012.32
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The specific choice of workload task schedulers for Hadoop MapReduce applications can have a dramatic effect on job workload latency. The Hadoop Fair Scheduler (FairS) assigns resources to jobs such that all jobs get, on average, an equal share of resources over time. Thus, it addresses the problem with a FIFO scheduler when short jobs have to wait for long running jobs to complete. We show that even for the FairS, jobs are still forced to wait significantly when the MapReduce system assigns equal sharing of resources due to dependencies between Map, Shuffle, Sort, Reduce phases. We propose a Hybrid Scheduler (HybS) algorithm based on dynamic priority in order to reduce the latency for variable length concurrent jobs, while maintaining data locality. The dynamic priorities can accommodate multiple task lengths, job sizes, and job waiting times by applying a greedy fractional knapsack algorithm for job task processor assignment. The estimated runtime of Map and Reduce tasks are provided to the HybS dynamic priorities from the historical Hadoop log files. In addition to dynamic priority, we implement a reordering of task processor assignment to account for data availability to automatically maintain the benefits of data locality in this environment. We evaluate our approach by running concurrent workloads consisting of the Word-count and Terasort benchmarks, and a satellite scientific data processing workload and developing a simulator. Our evaluation shows the HybS system improves the average response time for the workloads approximately 2.1x faster over the Hadoop FairS with a standard deviation of 1.4x.
引用
收藏
页码:161 / 167
页数:7
相关论文
共 50 条
  • [1] Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds
    Rao, B. Thirumala
    Reddy, L. S. S.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (06): : 105 - 112
  • [2] Cost-Minimizing Preemptive Scheduling of MapReduce Workloads on Hybrid Clouds
    Qiu, Xuanjia
    Yeow, Wai Leong
    Wu, Chuan
    Lau, Francis C. M.
    [J]. 2013 IEEE/ACM 21ST INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2013, : 213 - 218
  • [3] Pareto-based Scheduling of MapReduce Workloads
    Zacheilas, Nikos
    Kalogeraki, Vana
    [J]. 2016 IEEE 19TH INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING (ISORC 2016), 2016, : 174 - 181
  • [4] A MapReduce Scheduling Algorithm for Time constraints in Heterogeneous Environment
    Deng, Tan
    Li, Kenli
    [J]. 2014 10TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2014, : 1088 - 1093
  • [5] HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework
    Gandomi, Abolfazl
    Reshadi, Midia
    Movaghar, Ali
    Khademzadeh, Ahmad
    [J]. JOURNAL OF BIG DATA, 2019, 6 (01)
  • [6] HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework
    Abolfazl Gandomi
    Midia Reshadi
    Ali Movaghar
    Ahmad Khademzadeh
    [J]. Journal of Big Data, 6
  • [7] Hybrid Resource Management for HPC and Data Intensive Workloads
    Souza, Abel
    Rezaei, Mohamad
    Laure, Erwin
    Tordsson, Johan
    [J]. 2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 399 - 409
  • [8] FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads
    Wolf, Joel
    Rajan, Deepak
    Hildrum, Kirsten
    Khandekar, Rohit
    Kumar, Vibhore
    Parekh, Sujay
    Wu, Kun-Lung
    Balmin, Andrey
    [J]. MIDDLEWARE 2010, 2010, 6452 : 1 - +
  • [9] Heuristic scheduling algorithm for hybrid storage data in the cloud computing environment
    Luo, Dawei
    Liu, Jinming
    Xin, Zhihong
    [J]. INTERNATIONAL JOURNAL OF INTERNET PROTOCOL TECHNOLOGY, 2020, 13 (03) : 131 - 136
  • [10] Improving MapReduce scheduler for heterogeneous workloads in a heterogeneous environment
    Jeyaraj, Rathinaraja
    Ananthanarayana, V. S.
    Paul, Anand
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (07):