Stage Delay Scheduling: Speeding up DAG-style Data Analytics Jobs with Resource Interleaving

被引：9

作者：

Shao, Wujie ^{[1
]}

Xu, Fei ^{[1
]}

Chen, Li ^{[2
]}

Zheng, Haoyue ^{[1
]}

Liu, Fangming ^{[3
]}

机构：

[1] East China Normal Univ, Dept Comp Sci & Technol, Shanghai Key Lab Multidimens Informat Proc, Shanghai, Peoples R China

[2] Univ Louisiana Lafayette, Dept Comp Sci, Lafayette, LA 70504 USA

[3] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan, Hubei, Peoples R China

来源：

PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019) | 2019年

关键词：

stage delay scheduling; parallel stages; resource interleaving; job completion time; big data analytics;

D O I：

10.1145/3337821.3337872

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

To increase the resource utilization of datacenters, big data analytics jobs are commonly running stages in parallel which are organized into and scheduled according to the Directed Acyclic Graph (DAG). Through an in-depth analysis of the latest Alibaba cluster trace and our motivation experiments on Amazon EC2, however, we show that the CPU and network resources are still under-utilized due to the unwise stage scheduling, thereby prolonging the completion time of a DAG-style job (e.g., Spark). While existing works on reducing the job completion time focus on either task scheduling or job scheduling, stage scheduling has received comparably little attention. In this paper, we design and implement DelayStage, a simple yet effective stage delay scheduling strategy to interleave the cluster resources across the parallel stages, so as to increase the cluster resource utilization and speed up the job performance. With the aim of minimizing the makespan of parallel stages, DelayStage judiciously arranges the execution of stages in a pipelined manner to maximize the performance benefits of resource interleaving. Extensive prototype experiments on 30 Amazon EC2 instances and complementary trace-driven simulations show that DelayStage can improve the cluster resource utilization by up to 81.8% and reduce the job completion time by up to 41.3%, in comparison to the stock Spark and the state-of-the-art stage scheduling strategies, yet with acceptable runtime overhead.

引用

页数：11

共 4 条

[1] Branch Scheduling: DAG-Aware Scheduling for Speeding up Data-Parallel Jobs
Hu, Zhiyao
Li, Dongsheng
Zhang, Yiming
Guo, Deke
Li, Ziyang
PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS 2019), 2019,
[2] Accelerating DAG-Style Job Execution via Optimizing Resource Pipeline Scheduling
Duan, Yubin
Wang, Ning
Wu, Jie
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2022, 37 (04) : 852 - 868
[3] Accelerating DAG-Style Job Execution via Optimizing Resource Pipeline Scheduling
Yubin Duan
Ning Wang
Jie Wu
Journal of Computer Science and Technology, 2022, 37 : 852 - 868
[4] Cluster Fair Queueing: Speeding up Data-Parallel Jobs with Delay Guarantees
Chen, Chen
Wang, Wei
Zhang, Shengkai
Li, Bo
IEEE INFOCOM 2017 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2017,

← 1 →