A fine-grained GPU sharing and job scheduling for deep learning jobs on the cloud

被引：0

作者：

Chung, Wu-Chun ^{[1
]}

Tong, Jyun-Sen ^{[1
]}

Chen, Zhi-Hao ^{[1
]}

机构：

[1] Chung Yuan Christian Univ, Dept Informat & Comp Engn, Taoyuan 320, Taiwan

来源：

JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 02期

关键词：

Deep learning; GPU sharing; Resource allocation; Job scheduling; Cloud computing;

D O I：

10.1007/s11227-024-06849-5

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper introduces an innovative GPU sharing and scheduling method to tackle resource wastage and underutilization in deep learning training jobs. Existing methods rely on execution time estimation models and rarely explore fine-grained GPU sharing. Our approach leverages a suspend and resume mechanism to save and migrate model training states. With a lightweight sampling analysis to predict job completion times, the proposed method tackles large job starvation and reuses fragmented resources. By efficiently utilizing fragmented resources, the scheduler reduces job completion and waiting times. Performances are evaluated using Microsoft Philly data and TF-Slim benchmarks on four image classification models to demonstrate significant improvements. Compared to traditional methods, our approach increases resource utilization by 4.1 times and reduces completion time by 3.6 times. The proposed method significantly enhances deep learning training efficiency and optimizes idle GPU resource usage, providing a flexible and efficient solution for future training needs.

引用

页数：30

共 50 条

[1] Efficient Sharing and Fine-Grained Scheduling of Virtualized GPU Resources
Zhao, Xiaohui
Yao, Jianguo
Gao, Ping
Guan, Haibing
2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 742 - 752
[2] GPUPool: A Holistic Approach to Fine-Grained GPU Sharing in the Cloud
Tan, Xiaodan Serina
Golikov, Pavel
Vijaykumar, Nandita
Pekhimenko, Gennady
PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 317 - 332
[3] Fine-Grained Scheduling in Cloud Gaming on Heterogeneous CPU-GPU Clusters
Zhang, Wei
Liao, Xiaofei
Li, Peng
Jin, Hai
Lin, Li
Zhou, Bing Bing
IEEE NETWORK, 2018, 32 (01): : 172 - 178
[4] ShareRender: Bypassing GPU Virtualization to Enable Fine-grained Resource Sharing for Cloud Gaming
Zhang, Wei
Liao, Xiaofei
Li, Peng
Jin, Hai
Lin, Li
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 324 - 332
[5] Advice Complexity of Fine-Grained Job Shop Scheduling
Wehner, David
ALGORITHMS AND COMPLEXITY (CIAC 2015), 2015, 9079 : 416 - 428
[6] Hierarchical Bucket Queuing for Fine-Grained Priority Scheduling on the GPU
Kerbl, Bernhard
Kenzel, Michael
Schmalstieg, Dieter
Seidel, Hans-Peter
Steinberger, Markus
COMPUTER GRAPHICS FORUM, 2017, 36 (08) : 232 - 246
[7] Dynamically Fine-grained Scheduling Method in Cloud Environment
Zhou M.-S.
Dong X.-S.
Chen H.
Zhang X.-J.
Ruan Jian Xue Bao/Journal of Software, 2020, 31 (12): : 3981 - 3999
[8] Nimblock: Scheduling for Fine-grained FPGA Sharing through Virtualization
Mandava, Meghna
Reckamp, Paul
Chen, Deming
PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023, 2023, : 843 - 855
[9] Fine-Grained Data Sharing in Cloud Computing for Mobile Devices
Shao, Jun
Lu, Rongxing
Lin, Xiaodong
2015 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (INFOCOM), 2015,
[10] Fast and Fine-grained Autoscaler for Streaming Jobs with Reinforcement Learning
Xing, Mingzhe
Mao, Hangyu
Xiao, Zhen
PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 564 - 570

← 1 2 3 4 5 →