A fine-grained GPU sharing and job scheduling for deep learning jobs on the cloud

被引:0
|
作者
Chung, Wu-Chun [1 ]
Tong, Jyun-Sen [1 ]
Chen, Zhi-Hao [1 ]
机构
[1] Chung Yuan Christian Univ, Dept Informat & Comp Engn, Taoyuan 320, Taiwan
来源
JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 02期
关键词
Deep learning; GPU sharing; Resource allocation; Job scheduling; Cloud computing;
D O I
10.1007/s11227-024-06849-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces an innovative GPU sharing and scheduling method to tackle resource wastage and underutilization in deep learning training jobs. Existing methods rely on execution time estimation models and rarely explore fine-grained GPU sharing. Our approach leverages a suspend and resume mechanism to save and migrate model training states. With a lightweight sampling analysis to predict job completion times, the proposed method tackles large job starvation and reuses fragmented resources. By efficiently utilizing fragmented resources, the scheduler reduces job completion and waiting times. Performances are evaluated using Microsoft Philly data and TF-Slim benchmarks on four image classification models to demonstrate significant improvements. Compared to traditional methods, our approach increases resource utilization by 4.1 times and reduces completion time by 3.6 times. The proposed method significantly enhances deep learning training efficiency and optimizes idle GPU resource usage, providing a flexible and efficient solution for future training needs.
引用
收藏
页数:30
相关论文
共 50 条
  • [31] Achieving fine-grained access control for secure data sharing on cloud servers
    Wang, Guojun
    Liu, Qin
    Wu, Jie
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (12): : 1443 - 1464
  • [32] A Secure and Lightweight Fine-Grained Data Sharing Scheme for Mobile Cloud Computing
    Li, Haifeng
    Lan, Caihui
    Fu, Xingbing
    Wang, Caifen
    Li, Fagen
    Guo, He
    SENSORS, 2020, 20 (17) : 1 - 17
  • [33] Improve Fine-Grained Feature Learning in Fine-Grained DataSet GAI
    Wang, Hai Peng
    Geng, Zhi Qing
    IEEE ACCESS, 2025, 13 : 12777 - 12788
  • [34] A Fine-Grained and Dynamic MapReduce Task Scheduling Scheme for the Heterogeneous Cloud Environment
    Mao, Yingchi
    Zhong, Haishi
    Wang, Longbao
    14TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS, ENGINEERING AND SCIENCE (DCABES 2015), 2015, : 155 - 158
  • [35] A fine-grained medical data sharing scheme based on federated learning
    Liu, Wei
    Zhang, Ying-Hui
    Li, Yi-Fei
    Zheng, Dong
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (20):
  • [36] Fine-Grained Classification of Hyperspectral Imagery Based on Deep Learning
    Chen, Yushi
    Huang, Lingbo
    Zhu, Lin
    Yokoya, Naoto
    Jia, Xiuping
    REMOTE SENSING, 2019, 11 (22)
  • [37] A Survey of Fine-Grained Visual Categorization Based on Deep Learning
    Xie, Yuxiang
    Gong, Quanzhi
    Luan, Xidao
    Yan, Jie
    Zhang, Jiahui
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2024, 35 (06) : 1337 - 1356
  • [38] VenueNet: Fine-Grained Venue Discovery by Deep Correlation Learning
    Yu, Yi
    Tang, Suhua
    Aizawa, Kiyoharu
    Aizawa, Akiko
    2017 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2017, : 288 - 291
  • [39] An Interactive Deep Learning Method For Fine-grained Image Classification
    Luo, Liumin
    Wang, Mingxia
    Liu, Xiaoqing
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2025, 28 (04): : 701 - 708
  • [40] A model for fine-grained vehicle classification based on deep learning
    Yu, Shaoyong
    Wu, Yun
    Li, Wei
    Song, Zhijun
    Zeng, Wenhua
    NEUROCOMPUTING, 2017, 257 : 97 - 103