Orchestra: Adaptively Accelerating Distributed Deep Learning in Heterogeneous Environments

被引:1
|
作者
Du, Haizhou [1 ]
Huang, Sheng [1 ]
Xiang, Qiao [2 ]
机构
[1] Shanghai Univ Elect Power, Shanghai, Peoples R China
[2] Xiamen Univ, Xiamen, Peoples R China
关键词
Distributed Deep Learning; Local Update Adaptation; Load-Balance; Heterogeneous Environments;
D O I
10.1145/3528416.3530246
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The synchronized Local-SGD(Stochastic gradient descent) strategy becomes a more popular in distributed deep learning (DML) since it can effectively reduce the frequency of model communication and ensure global model convergence. However, it works not well and leads to excessive training time in heterogeneous environments due to the difference in workers' performance. Especially, in some data unbalanced scenarios, these differences between workers may aggravate low utilization of resources and eventually lead to stragglers, which seriously hurt the whole training procedure. Existing solutions either suffer from a heterogeneity of computing resources or do not fully address the environment dynamics. In this paper, we eliminate the negative impacts of dynamic resource constraints issues in heterogeneous DML environments with a novel, adaptive load-balancing framework called Orchestra. The main idea of Orchestra is to improve resource utilization by load balance between worker performance and the unbalance of data volume. Additionally, one of Orchestra's strongest features is the number of local updates adaptation at each epoch per worker. To achieve this improvement, we propose a distributed deep reinforcement learning-driven algorithm for per-worker to dynamically determine the number of local updates adaptation and training data volume, subject to mini-batch cost time and resource constraints at each epoch. Our design significantly improves the convergence speed of the model in DML compared with other state-of-the-art.
引用
收藏
页码:181 / 184
页数:4
相关论文
共 50 条
  • [21] Accelerating Distributed Deep Reinforcement Learning by In-Network Experience Sampling
    Furukawa, Masaki
    Matsutani, Hiroki
    30TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2022), 2022, : 75 - 82
  • [22] Accelerating Gossip-Based Deep Learning in Heterogeneous Edge Computing Platforms
    Han, Rui
    Li, Shilin
    Wang, Xiangwei
    Liu, Chi Harold
    Xin, Gaofeng
    Chen, Lydia Y.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (07) : 1591 - 1602
  • [23] Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data
    Lin, Tao
    Karimireddy, Sai Praneeth
    Stich, Sebastian U.
    Jaggi, Martin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [24] A One-Shot Framework for Distributed Clustered Learning in Heterogeneous Environments
    Armacki, Aleksandar
    Bajovic, Dragana
    Jakovetic, Dusan
    Kar, Soummya
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 636 - 651
  • [25] Hierarchical Heterogeneous Cluster Systems for Scalable Distributed Deep Learning
    Wang, Yibo
    Geng, Tongsheng
    Silva, Ericson
    Gaudiot, Jean-Luc
    2024 IEEE 27TH INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING, ISORC 2024, 2024,
  • [26] Distributed Deep Learning With GPU-FPGA Heterogeneous Computing
    Tanaka, Kenji
    Arikawa, Yuki
    Ito, Tsuyoshi
    Morita, Kazutaka
    Nemoto, Naru
    Terada, Kazuhiko
    Teramoto, Junji
    Sakamoto, Takeshi
    IEEE MICRO, 2021, 41 (01) : 15 - 22
  • [27] Straggler-Aware In-Network Aggregation for Accelerating Distributed Deep Learning
    Lee, Hochan
    Lee, Jaewook
    Kim, Heewon
    Pack, Sangheon
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (06) : 4198 - 4204
  • [28] An Efficient Deep-Learning-Based Super-Resolution Accelerating SoC With Heterogeneous Accelerating and Hierarchical Cache
    Li, Zhiyong
    Kim, Sangjin
    Im, Dongseok
    Han, Donghyeon
    Yoo, Hoi-Jun
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2023, 58 (03) : 614 - 623
  • [29] Adaptively heterogeneous transfer learning for hyperspectral image classification
    Zhao, Zihao
    Chen, Yushi
    He, Xin
    REMOTE SENSING LETTERS, 2022, 13 (12) : 1182 - 1193
  • [30] Accelerating deep learning with memcomputing
    Manukian, Haik
    Traversa, Fabio L.
    Di Ventra, Massimiliano
    NEURAL NETWORKS, 2019, 110 : 1 - 7