ASHL: An Adaptive Multi-Stage Distributed Deep Learning Training Scheme for Heterogeneous Environments

被引：0

作者：

Shen, Zhaoyan ^{[1
]}

Tang, Qingxiang ^{[1
]}

Zhou, Tianren ^{[1
]}

Zhang, Yuhao ^{[2
]}

Jia, Zhiping ^{[1
]}

Yu, Dongxiao ^{[1
]}

Zhang, Zhiyong ^{[3
]}

Li, Bingzhe ^{[4
]}

机构：

[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266237, Peoples R China

[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

[3] Quan Cheng Lab, Jinan 250103, Peoples R China

[4] Univ Texas Dallas, Comp Sci Dept, Richardson, TX 75080 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2024年 / 73卷 / 01期

关键词：

Distributed deep learning; parameter server; data parallelism; OPTIMIZATION;

D O I：

10.1109/TC.2023.3315847

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the increment of data sets and models sizes, distributed deep learning has been proposed to accelerate training and improve the accuracy of DNN models. The parameter server framework is a popular collaborative architecture for data-parallel training, which works well for homogeneous environments by properly aggregating the computation/communication capabilities of different workers. However, in heterogeneous environments, the resources of different workers vary a lot. Some stragglers may seriously limit the whole speed, which impacts the overall training process. In this paper, we propose an adaptive multi-stage distributed deep learning training framework, named ASHL, for heterogeneous environments. First, a profiling scheme is proposed to capture the capabilities of each worker to reasonably plan the training and communication tasks on each worker, and lay the foundation for the formal training. Second, a hybrid-mode training scheme (i.e., coarse-grained and fined-grained training) is proposed to balance the model accuracy and training speed. The coarse-grained training scheme (named AHL) adopts an asynchronous communication strategy, which involves less frequent communications. Its main goal is to make the model quickly converge to a certain level. The fine-grained training stage (named SHL) uses a semi-asynchronous communication strategy and adopts a high communication frequency. Its main goal is to improve the model convergence effect. Finally, a compression-based communication scheme is proposed to further increase the communication efficiency of the training process. Our experimental results show that ASHL reduces the overall training time by more than 35% to converge to the same degree and has better generalization ability compared with state-of-the-art schemes like ADSP.

引用

页码：30 / 43

页数：14

共 50 条

[21] MOLFAS: a Multi-stage Online Learning hffectiveness Assessment Scheme in MOOC
Xiao, Fu
Li, Qun
Huang, Haiping
Sun, Lijuan
Xu, Xiaolong
PROCEEDINGS OF 2020 IEEE INTERNATIONAL CONFERENCE ON TEACHING, ASSESSMENT, AND LEARNING FOR ENGINEERING (IEEE TALE 2020), 2020, : 31 - 38
[22] Multi-stage learning aids applied to hands-on software training
Rother, Kristian
Rother, Magdalena
Pleus, Alexandra
Belzen, Annette Upmeier Zu
BRIEFINGS IN BIOINFORMATICS, 2010, 11 (06) : 582 - 586
[23] Adaptive hypergraph learning with multi-stage optimizations for image and tag recommendation
Karantaidis, Georgios
Sarridis, Ioannis
Kotropoulos, Constantine
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 97
[24] An effective multi-stage evolutionary algorithm for distributed scheduling with splitting jobs in heterogeneous factories
Guo, Xin
Deng, Qianwang
Luo, Qiang
Xie, Guanhua
ENGINEERING OPTIMIZATION, 2024,
[25] A multi-stage deep learning based algorithm for multiscale model reduction
Chung, Eric
Leung, Wing Tat
Pun, Sai-Mang
Zhang, Zecheng
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2021, 394 (394)
[26] Multi-stage Deep Learning Technique for Improving Traffic Sign Recognition
Sanjeewani, Pubudu
Verma, Brijesh
Affum, Joseph
PROCEEDINGS OF THE 2021 36TH INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2021,
[27] Addressing Reward Engineering for Deep Reinforcement Learning on Multi-stage Task
Chen, Bin
Su, Jianhua
NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 309 - 317
[28] Multi-stage deep convolutional learning for people re-identification
Zhang, Guan-Wen
Kato, Jien
Wang, Yu
Mase, Kenji
COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2014, 29 (04): : 265 - 274
[29] LUNG CANCER IDENTIFICATION VIA DEEP LEARNING: A MULTI-STAGE WORKFLOW
Canavesi, Irene
D'Arnese, Eleonora
Caramaschi, Sara
Santambrogio, Marco D.
2022 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (IEEE ISBI 2022), 2022,
[30] A Deep Reinforcement Learning Framework for Multi-Stage Optimized Object Detection
Siamak, Sobhan
Mansoori, Eghbal
2022 10TH RSI INTERNATIONAL CONFERENCE ON ROBOTICS AND MECHATRONICS (ICROM), 2022, : 132 - 138

← 1 2 3 4 5 →