Preemptive Scheduling for Distributed Machine Learning Jobs in Edge-Cloud Networks

被引:5
|
作者
Wang, Ne [1 ]
Zhou, Ruiting [1 ,2 ]
Jiao, Lei [3 ]
Zhang, Renli [1 ,2 ]
Li, Bo [4 ]
Li, Zongpeng [1 ,5 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Wuhan Univ, Sch Cyber Sci & Engn, Minist Educ, Key Lab Aerosp Informat Secur & Trusted Comp, Wuhan 430072, Peoples R China
[3] Univ Oregon, Dept Comp & Informat Sci, Eugene, OR 97403 USA
[4] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[5] Tsinghua Univ, Inst Network Sci & Cyberspace, Beijing 100190, Peoples R China
基金
美国国家科学基金会;
关键词
Distributed machine learning; parameter server architecture; preemptive scheduling; edge-cloud networks;
D O I
10.1109/JSAC.2022.3180772
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recent advances in 5G and edge computing enable rapid development and deployment of edge-cloud systems, which are ideal for delay-sensitive machine learning (ML) applications such as autonomous driving and smart city. Distributed ML jobs often need to train a large model with enormous datasets, which can only be handled by deploying a distributed set of workers in an edge-cloud system. One common approach is to employ a parameter server (PS) architecture, in which training is carried out at multiple workers, while PSs are used for aggregation and model updates. In this architecture, one of the fundamental challenges is how to dispatch ML jobs to workers and PSs such that the average job completion time (JCT) can be minimized. In this work, we propose a novel online preemptive scheduling framework to decide the location and the execution time window of concurrent workers and PSs upon each job arrival. Specifically, our proposed scheduling framework consists of: i) a job dispatching and scheduling algorithm that assigns each ML job to workers and decides the schedule to train each data chunk; ii) a PS assignment algorithm that determines the placement of PS. We prove theoretically that our proposed algorithm is D-max(1 + 1/epsilon)-competitive with (1 + epsilon)-speed augmentation, where D-max is the maximal number of data chunks in any job. Extensive testbed experiments and trace-driven simulations show that our algorithm can reduce the average JCT by up to 30% compared with state-of-the-art baselines.
引用
收藏
页码:2411 / 2425
页数:15
相关论文
共 50 条
  • [21] Collaborative Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud Network
    Shen, Shihao
    Han, Yiwen
    Wang, Xiaofei
    Wang, Shiqiang
    Leung, Victor C. M.
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (06) : 2950 - 2964
  • [22] Reinforcement learning empowered multi-AGV offloading scheduling in edge-cloud IIoT
    Peng Liu
    Zhe Liu
    Ji Wang
    Zifu Wu
    Peng Li
    Huijuan Lu
    [J]. Journal of Cloud Computing, 11
  • [23] Resource Utilization of Distributed Databases in Edge-Cloud Environment
    Mansouri, Yaser
    Prokhorenko, Victor
    Ullah, Faheem
    Babar, Muhammad Ali
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (11) : 9423 - 9437
  • [24] Intelligent Machine Tool Based on Edge-Cloud Collaboration
    Lou, Ping
    Liu, Shiyu
    Hu, Jianmin
    Li, Ruiya
    Xiao, Zheng
    Yan, Junwei
    [J]. IEEE ACCESS, 2020, 8 : 139953 - 139965
  • [25] Distributed Edge Machine Learning Pipeline Scheduling with Reverse Auctions
    Imes, Connor
    King, David W.
    Walters, John Paul
    [J]. 2023 EIGHTH INTERNATIONAL CONFERENCE ON FOG AND MOBILE EDGE COMPUTING, FMEC, 2023, : 196 - 203
  • [26] Edge-Cloud Computing for Scheduling the Energy Consumption in Smart Grid
    Alorf, Abdulaziz
    [J]. Computer Systems Science and Engineering, 2023, 46 (01): : 273 - 286
  • [27] Edge-Cloud Resource Scheduling in Space-Air-Ground-Integrated Networks for Internet of Vehicles
    Cao, Bin
    Zhang, Jintong
    Liu, Xin
    Sun, Zhiheng
    Cao, Wenxi
    Nowak, Robert M.
    Lv, Zhihan
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (08): : 5765 - 5772
  • [28] Leveraging the serverless paradigm for realizing machine learning pipelines across the edge-cloud continuum
    Paraskevoulakou, Efterpi
    Kyriazis, Dimosthenis
    [J]. 2021 24TH CONFERENCE ON INNOVATION IN CLOUDS, INTERNET AND NETWORKS AND WORKSHOPS (ICIN), 2021,
  • [29] Hybrid Learning for Orchestrating Deep Learning Inference in Multi-user Edge-cloud Networks
    Shahhosseini, Sina
    Hu, Tianyi
    Seo, Dongjoo
    Kanduri, Anil
    Donyanavard, Bryan
    Rahmani, Amir M.
    Dutt, Nikil
    [J]. PROCEEDINGS OF THE TWENTY THIRD INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2022), 2022, : 1 - 6
  • [30] Online Training Flow Scheduling for Geo-Distributed Machine Learning Jobs Over Heterogeneous and Dynamic Networks
    Fan, Lang
    Zhang, Xiaoning
    Zhao, Yangming
    Sood, Keshav
    Yu, Shui
    [J]. IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2024, 10 (01) : 277 - 291