Training deep neural networks: a static load balancing approach

被引：11

作者：

Moreno-Alvarez, Sergio ^{[1
]}

Haut, Juan M. ^{[2
]}

Paoletti, Mercedes E. ^{[2
]}

Rico-Gallego, Juan A. ^{[1
]}

Diaz-Martin, Juan C. ^{[2
]}

Plaza, Javier ^{[2
]}

机构：

[1] Univ Extremadura, Dept Comp Syst Engn & Telemat, Caceres, Spain

[2] Univ Extremadura, Dept Technol Comp & Commun, Caceres, Spain

来源：

JOURNAL OF SUPERCOMPUTING | 2020年 / 76卷 / 12期

关键词：

Deep learning; High-performance computing; Distributed training; Heterogeneous platforms;

D O I：

10.1007/s11227-020-03200-6

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep neural networks are currently trained under data-parallel setups on high-performance computing (HPC) platforms, so that a replica of the full model is charged to each computational resource using non-overlapped subsets known as batches. Replicas combine the computed gradients to update their local copies at the end of each batch. However, differences in performance of resources assigned to replicas in current heterogeneous platforms induce waiting times when synchronously combining gradients, leading to an overall performance degradation. Albeit asynchronous communication of gradients has been proposed as an alternative, it suffers from the so-called staleness problem. This is due to the fact that the training in each replica is computed using a stale version of the parameters, which negatively impacts the accuracy of the resulting model. In this work, we study the application of well-known HPC static load balancing techniques to the distributed training of deep models. Our approach is assigning a different batch size to each replica, proportional to its relative computing capacity, hence minimizing the staleness problem. Our experimental results (obtained in the context of a remotely sensed hyperspectral image processing application) show that, while the classification accuracy is kept constant, the training time substantially decreases with respect to unbalanced training. This is illustrated using heterogeneous computing platforms, made up of CPUs and GPUs with different performance.

引用

页码：9739 / 9754

页数：16

共 50 条

[1] Training deep neural networks: a static load balancing approach
Sergio Moreno-Álvarez
Juan M. Haut
Mercedes E. Paoletti
Juan A. Rico-Gallego
Juan C. Díaz-Martín
Javier Plaza
The Journal of Supercomputing, 2020, 76 : 9739 - 9754
[2] ON TRAINING DEEP NEURAL NETWORKS USING A STREAMING APPROACH
Duda, Piotr
Jaworski, Maciej
Cader, Andrzej
Wang, Lipo
JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2020, 10 (01) : 15 - 26
[3] A Deep Reinforcement Learning Approach for Load Balancing In Open Radio Access Networks
Zafar, Hammad
Kasparick, Martin
Maghsudi, Setareh
Stanczak, Slawomir
IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 3723 - 3728
[4] Load Balancing for Ultradense Networks: A Deep Reinforcement Learning-Based Approach
Xu, Yue
Xu, Wenjun
Wang, Zhi
Lin, Jiaru
Cui, Shuguang
IEEE INTERNET OF THINGS JOURNAL, 2019, 6 (06): : 9399 - 9412
[5] A Gradient Boosting Approach for Training Convolutional and Deep Neural Networks
Emami, Seyedsaman
Martinez-Munoz, Gonzalo
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2023, 4 : 313 - 321
[6] DLB: A Dynamic Load Balance Strategy for Distributed Training of Deep Neural Networks
Ye, Qing
Zhou, Yuhao
Shi, Mingjia
Sun, Yanan
Lv, Jiancheng
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (04): : 1217 - 1227
[7] Quasi-static load balancing in local area networks
Krajewski, M
Hofmann, U
KUWAIT JOURNAL OF SCIENCE & ENGINEERING, 1996, 23 (02): : 181 - 197
[8] Load Balancing Using Neural Networks Approach for Assisted Content Delivery in Heterogeneous Network
Sakat, Raid
Saadoon, Raed
Abbod, Maysam
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, 2020, 1038 : 533 - 547
[9] A practical approach for load balancing in LTE networks
Feng, S. (fengsl@scut.edu.cn), 1600, Engineering and Technology Publishing (09)
[10] A Gradient-Guided Evolutionary Approach to Training Deep Neural Networks
Yang, Shangshang
Tian, Ye
He, Cheng
Zhang, Xingyi
Tan, Kay Chen
Jin, Yaochu
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (09) : 4861 - 4875

← 1 2 3 4 5 →