Sampling-Based Multi-Job Placement for Heterogeneous Deep Learning Clusters

被引:0
|
作者
Liu, Kaiyang [1 ]
Wang, Jingrong [2 ]
Huang, Zhiming [3 ]
Pan, Jianping [3 ]
机构
[1] Mem Univ Newfoundland, Dept Comp Sci, St John, NF A1B 3X5, Canada
[2] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 3G4, Canada
[3] Univ Victoria, Dept Comp Sci, Victoria, BC V8P 5C2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Training; Deep learning; Load management; Processor scheduling; Computational modeling; Throughput; Parallel processing; Distributed deep learning; job placement; job sizing; load balancing; heterogeneity-aware scheduling; fairness;
D O I
10.1109/TPDS.2024.3390109
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Heterogeneous deep learning clusters commonly host a variety of distributed learning jobs. In such scenarios, the training efficiency of learning models is negatively affected by the slowest worker. To accelerate the training process, multiple learning jobs may compete for limited computational resources, posing significant challenges to multi-job placement among heterogeneous workers. This article presents a heterogeneity-aware scheduler to solve the multi-job placement problem while taking into account job sizing and load balancing, minimizing the average Job Completion Time (JCT) of deep learning jobs. A novel scheme based on proportional training workload assignment, feasible solution categorization, and matching markets is proposed with theoretical guarantees. To further reduce the computational complexity for low latency decision-making and improve scheduling fairness, we propose to construct the sparsification of feasible solution categories through sampling, which has negligible performance loss in JCT. We evaluate the performance of our design with real-world deep neural network benchmarks on heterogeneous computing clusters. Experimental results show that, compared to existing solutions, the proposed sampling-based scheme can achieve 1) results within 2.04% of the optimal JCT with orders-of-magnitude improvements in algorithm running time, and 2) high scheduling fairness among learning jobs.
引用
收藏
页码:874 / 888
页数:15
相关论文
共 50 条
  • [31] Sampling-based novel heterogeneous multi-layer stacking ensemble method for telecom customer churn prediction
    Usman-Hamza, Fatima E.
    Balogun, Abdullateef O.
    Amosa, Ramoni T.
    Capretz, Luiz Fernando
    Mojeed, Hammed A.
    Salihu, Shakirat A.
    Akintola, Abimbola G.
    Mabayoje, Modinat A.
    SCIENTIFIC AFRICAN, 2024, 24
  • [32] Multi-job Associated Task Scheduling Based on Task Duplication and Insertion for Cloud Computing
    Fan, Yuqi
    Wang, Lunfei
    Chen, Jie
    Jin, Zhifeng
    Shi, Lei
    Xu, Juan
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, PT I, 2020, 12384 : 109 - 120
  • [33] Efficient sampling-based Bayesian Active Learning for synaptic characterization
    Gontier, Camille
    Surace, Simone Carlo
    Delvendahl, Igor
    Mueller, Martin
    Pfister, Jean-Pascal
    PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (08)
  • [34] MORRF* : Sampling-Based Multi-Objective Motion Planning
    Yi, Daqing
    Goodrich, Michael A.
    Seppi, Kevin D.
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 1733 - 1739
  • [35] Quantum Ensemble Classification: A Sampling-Based Learning Control Approach
    Chen, Chunlin
    Dong, Daoyi
    Qi, Bo
    Petersen, Ian R.
    Rabitz, Herschel
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (06) : 1345 - 1359
  • [36] Sampling-based learning control for quantum discrimination and ensemble classification
    Chen, Chunlin
    Dong, Daoyi
    Qi, Bo
    Petersen, Ian R.
    Rabitz, Herschel
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 880 - 885
  • [37] Hybrid sampling-based contrastive learning for imbalanced node classification
    Cui, Caixia
    Wang, Jie
    Wei, Wei
    Liang, Jiye
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (03) : 989 - 1001
  • [38] Hybrid sampling-based contrastive learning for imbalanced node classification
    Caixia Cui
    Jie Wang
    Wei Wei
    Jiye Liang
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 989 - 1001
  • [39] Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems
    Suttle, Wesley A.
    Sharma, Vipul K.
    Kosaraju, Krishna C.
    Sivaranjani, S.
    Liu, Ji
    Gupta, Vijay
    Sadler, Brian M.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [40] ESRL: Efficient Sampling-Based Reinforcement Learning for Sequence Generation
    Wang, Chenglong
    Zhou, Hang
    Hu, Yimin
    Huo, Yifu
    Li, Bei
    Liu, Tongran
    Xiao, Tong
    Zhu, Jingbo
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19107 - 19115