ASHL: An Adaptive Multi-Stage Distributed Deep Learning Training Scheme for Heterogeneous Environments

被引:0
|
作者
Shen, Zhaoyan [1 ]
Tang, Qingxiang [1 ]
Zhou, Tianren [1 ]
Zhang, Yuhao [2 ]
Jia, Zhiping [1 ]
Yu, Dongxiao [1 ]
Zhang, Zhiyong [3 ]
Li, Bingzhe [4 ]
机构
[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266237, Peoples R China
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[3] Quan Cheng Lab, Jinan 250103, Peoples R China
[4] Univ Texas Dallas, Comp Sci Dept, Richardson, TX 75080 USA
关键词
Distributed deep learning; parameter server; data parallelism; OPTIMIZATION;
D O I
10.1109/TC.2023.3315847
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the increment of data sets and models sizes, distributed deep learning has been proposed to accelerate training and improve the accuracy of DNN models. The parameter server framework is a popular collaborative architecture for data-parallel training, which works well for homogeneous environments by properly aggregating the computation/communication capabilities of different workers. However, in heterogeneous environments, the resources of different workers vary a lot. Some stragglers may seriously limit the whole speed, which impacts the overall training process. In this paper, we propose an adaptive multi-stage distributed deep learning training framework, named ASHL, for heterogeneous environments. First, a profiling scheme is proposed to capture the capabilities of each worker to reasonably plan the training and communication tasks on each worker, and lay the foundation for the formal training. Second, a hybrid-mode training scheme (i.e., coarse-grained and fined-grained training) is proposed to balance the model accuracy and training speed. The coarse-grained training scheme (named AHL) adopts an asynchronous communication strategy, which involves less frequent communications. Its main goal is to make the model quickly converge to a certain level. The fine-grained training stage (named SHL) uses a semi-asynchronous communication strategy and adopts a high communication frequency. Its main goal is to improve the model convergence effect. Finally, a compression-based communication scheme is proposed to further increase the communication efficiency of the training process. Our experimental results show that ASHL reduces the overall training time by more than 35% to converge to the same degree and has better generalization ability compared with state-of-the-art schemes like ADSP.
引用
收藏
页码:30 / 43
页数:14
相关论文
共 50 条
  • [31] A Multi-Stage Deep Learning Approach for Business Process Event Prediction
    Mehdiyev, Nijat
    Fettke, Peter
    Evermann, Joerg
    2017 IEEE 19TH CONFERENCE ON BUSINESS INFORMATICS (CBI), VOL 1, 2017, 1 : 119 - 128
  • [32] Modeling the Training Iteration Time for Heterogeneous Distributed Deep Learning Systems
    Zeng, Yifu
    Chen, Bowei
    Pan, Pulin
    Li, Kenli
    Chen, Guo
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2023, 2023
  • [33] Research on Typhoon Multi-Stage Cloud Characteristics Based on Deep Learning
    Wang, Mengran
    Cao, Yongqiang
    Yao, Jiaqi
    Zhu, Hong
    Zhang, Ningyue
    Ji, Xinhui
    Li, Jing
    Guo, Zichun
    Primavera, Leonardo
    ATMOSPHERE, 2023, 14 (12)
  • [34] Multi-Stage Optimization of Deep Learning Model to Detect Thoracic Complications
    Ratul, Rizwanul Hoque
    Husain, Farah Anjum
    Purnata, Tajmim Hossain
    Pomil, Rifat Alam
    Khandoker, Shaima
    Parvez, Mohammad Zavid
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 3000 - 3005
  • [35] A novel distributed deep learning training scheme based on distributed skip mesh list
    Suzuki, Masaya
    Mizutani, Kimihiro
    IEICE COMMUNICATIONS EXPRESS, 2021, 10 (08): : 463 - 468
  • [36] Implicitly heterogeneous multi-stage programming for FPGAs
    Chen, Fulong
    Goyal, Rajat
    Westbrook, Edwin
    Taha, Walid
    Journal of Computational Information Systems, 2010, 6 (14): : 4915 - 4922
  • [37] An Incremental Iterative Acceleration Architecture in Distributed Heterogeneous Environments With GPUs for Deep Learning
    Zhang, Xuedong
    Tang, Zhuo
    Du, Lifan
    Yang, Li
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (11) : 2823 - 2837
  • [38] Multi-stage formulation of Optimal Distributed Generation Placement using Reinforcement Learning
    Maya, K. N.
    Jasmin, E. A.
    2016 IEEE INTERNATIONAL CONFERENCE ON POWER ELECTRONICS, DRIVES AND ENERGY SYSTEMS (PEDES), 2016,
  • [39] Distributed multi-stage coding of correlated sources
    Saxena, Ankur
    Rose, Kenneth
    DCC: 2008 DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2008, : 312 - 321
  • [40] Multi-stage detection scheme for CDMA systems
    Rezaaifar, E
    LeNgoc, T
    1997 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS I AND II: ENGINEERING INNOVATION: VOYAGE OF DISCOVERY, 1997, : 474 - 477