Training deep neural networks: a static load balancing approach

被引：11

作者：

Moreno-Alvarez, Sergio ^{[1
]}

Haut, Juan M. ^{[2
]}

Paoletti, Mercedes E. ^{[2
]}

Rico-Gallego, Juan A. ^{[1
]}

Diaz-Martin, Juan C. ^{[2
]}

Plaza, Javier ^{[2
]}

机构：

[1] Univ Extremadura, Dept Comp Syst Engn & Telemat, Caceres, Spain

[2] Univ Extremadura, Dept Technol Comp & Commun, Caceres, Spain

来源：

JOURNAL OF SUPERCOMPUTING | 2020年 / 76卷 / 12期

关键词：

Deep learning; High-performance computing; Distributed training; Heterogeneous platforms;

D O I：

10.1007/s11227-020-03200-6

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep neural networks are currently trained under data-parallel setups on high-performance computing (HPC) platforms, so that a replica of the full model is charged to each computational resource using non-overlapped subsets known as batches. Replicas combine the computed gradients to update their local copies at the end of each batch. However, differences in performance of resources assigned to replicas in current heterogeneous platforms induce waiting times when synchronously combining gradients, leading to an overall performance degradation. Albeit asynchronous communication of gradients has been proposed as an alternative, it suffers from the so-called staleness problem. This is due to the fact that the training in each replica is computed using a stale version of the parameters, which negatively impacts the accuracy of the resulting model. In this work, we study the application of well-known HPC static load balancing techniques to the distributed training of deep models. Our approach is assigning a different batch size to each replica, proportional to its relative computing capacity, hence minimizing the staleness problem. Our experimental results (obtained in the context of a remotely sensed hyperspectral image processing application) show that, while the classification accuracy is kept constant, the training time substantially decreases with respect to unbalanced training. This is illustrated using heterogeneous computing platforms, made up of CPUs and GPUs with different performance.

引用

页码：9739 / 9754

页数：16

共 50 条

[21] Load Forecasting via Deep Neural Networks
He, Wan
5TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2017, 2017, 122 : 308 - 314
[22] Load Forecasting using Deep Neural Networks
Hosein, Stefan
Hosein, Patrick
2017 IEEE POWER & ENERGY SOCIETY INNOVATIVE SMART GRID TECHNOLOGIES CONFERENCE (ISGT), 2017,
[23] Deep Neural Networks for Energy Load Forecasting
Amarasinghe, Kasun
Marino, Daniel L.
Manic, Milos
2017 IEEE 26TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2017, : 1483 - 1488
[24] Enhancement of load balancing in electrical distribution networks using artificial neural networks
Kashem, MA
Ganapathy, V
Negnevitsky, M
IPEC 2003: PROCEEDINGS OF THE 6TH INTERNATIONAL POWER ENGINEERING CONFERENCE, VOLS 1 AND 2, 2003, : 848 - 853
[25] Is normalization indispensable for training deep neural networks?
Shao, Jie
Hu, Kai
Wang, Changhu
Xue, Xiangyang
Raj, Bhiksha
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[26] On Calibration of Mixup Training for Deep Neural Networks
Maronas, Juan
Ramos, Daniel
Paredes, Roberto
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2020, 2021, 12644 : 67 - 76
[27] Exploiting Invariance in Training Deep Neural Networks
Ye, Chengxi
Zhou, Xiong
McKinney, Tristan
Liu, Yanfeng
Zhou, Qinggang
Zhdanov, Fedor
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8849 - 8856
[28] Exploring strategies for training deep neural networks
Larochelle, Hugo
Bengio, Yoshua
Louradour, Jérôme
Lamblin, Pascal
Journal of Machine Learning Research, 2009, 10 : 1 - 40
[29] Training Deep Neural Networks with Gradual Deconvexification
Lo, Jawes Ting-Ho
Gui, Yichuan
Peng, Yun
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1000 - 1007
[30] Local Critic Training of Deep Neural Networks
Lee, Hojung
Lee, Jong-Seok
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,

← 1 2 3 4 5 →