Stage Delay Scheduling: Speeding up DAG-style Data Analytics Jobs with Resource Interleaving

被引:9
|
作者
Shao, Wujie [1 ]
Xu, Fei [1 ]
Chen, Li [2 ]
Zheng, Haoyue [1 ]
Liu, Fangming [3 ]
机构
[1] East China Normal Univ, Dept Comp Sci & Technol, Shanghai Key Lab Multidimens Informat Proc, Shanghai, Peoples R China
[2] Univ Louisiana Lafayette, Dept Comp Sci, Lafayette, LA 70504 USA
[3] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan, Hubei, Peoples R China
关键词
stage delay scheduling; parallel stages; resource interleaving; job completion time; big data analytics;
D O I
10.1145/3337821.3337872
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To increase the resource utilization of datacenters, big data analytics jobs are commonly running stages in parallel which are organized into and scheduled according to the Directed Acyclic Graph (DAG). Through an in-depth analysis of the latest Alibaba cluster trace and our motivation experiments on Amazon EC2, however, we show that the CPU and network resources are still under-utilized due to the unwise stage scheduling, thereby prolonging the completion time of a DAG-style job (e.g., Spark). While existing works on reducing the job completion time focus on either task scheduling or job scheduling, stage scheduling has received comparably little attention. In this paper, we design and implement DelayStage, a simple yet effective stage delay scheduling strategy to interleave the cluster resources across the parallel stages, so as to increase the cluster resource utilization and speed up the job performance. With the aim of minimizing the makespan of parallel stages, DelayStage judiciously arranges the execution of stages in a pipelined manner to maximize the performance benefits of resource interleaving. Extensive prototype experiments on 30 Amazon EC2 instances and complementary trace-driven simulations show that DelayStage can improve the cluster resource utilization by up to 81.8% and reduce the job completion time by up to 41.3%, in comparison to the stock Spark and the state-of-the-art stage scheduling strategies, yet with acceptable runtime overhead.
引用
收藏
页数:11
相关论文
共 4 条
  • [1] Branch Scheduling: DAG-Aware Scheduling for Speeding up Data-Parallel Jobs
    Hu, Zhiyao
    Li, Dongsheng
    Zhang, Yiming
    Guo, Deke
    Li, Ziyang
    PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS 2019), 2019,
  • [2] Accelerating DAG-Style Job Execution via Optimizing Resource Pipeline Scheduling
    Duan, Yubin
    Wang, Ning
    Wu, Jie
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2022, 37 (04) : 852 - 868
  • [3] Accelerating DAG-Style Job Execution via Optimizing Resource Pipeline Scheduling
    Yubin Duan
    Ning Wang
    Jie Wu
    Journal of Computer Science and Technology, 2022, 37 : 852 - 868
  • [4] Cluster Fair Queueing: Speeding up Data-Parallel Jobs with Delay Guarantees
    Chen, Chen
    Wang, Wei
    Zhang, Shengkai
    Li, Bo
    IEEE INFOCOM 2017 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2017,