Training deep neural networks: a static load balancing approach

被引:11
|
作者
Moreno-Alvarez, Sergio [1 ]
Haut, Juan M. [2 ]
Paoletti, Mercedes E. [2 ]
Rico-Gallego, Juan A. [1 ]
Diaz-Martin, Juan C. [2 ]
Plaza, Javier [2 ]
机构
[1] Univ Extremadura, Dept Comp Syst Engn & Telemat, Caceres, Spain
[2] Univ Extremadura, Dept Technol Comp & Commun, Caceres, Spain
来源
JOURNAL OF SUPERCOMPUTING | 2020年 / 76卷 / 12期
关键词
Deep learning; High-performance computing; Distributed training; Heterogeneous platforms;
D O I
10.1007/s11227-020-03200-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks are currently trained under data-parallel setups on high-performance computing (HPC) platforms, so that a replica of the full model is charged to each computational resource using non-overlapped subsets known as batches. Replicas combine the computed gradients to update their local copies at the end of each batch. However, differences in performance of resources assigned to replicas in current heterogeneous platforms induce waiting times when synchronously combining gradients, leading to an overall performance degradation. Albeit asynchronous communication of gradients has been proposed as an alternative, it suffers from the so-called staleness problem. This is due to the fact that the training in each replica is computed using a stale version of the parameters, which negatively impacts the accuracy of the resulting model. In this work, we study the application of well-known HPC static load balancing techniques to the distributed training of deep models. Our approach is assigning a different batch size to each replica, proportional to its relative computing capacity, hence minimizing the staleness problem. Our experimental results (obtained in the context of a remotely sensed hyperspectral image processing application) show that, while the classification accuracy is kept constant, the training time substantially decreases with respect to unbalanced training. This is illustrated using heterogeneous computing platforms, made up of CPUs and GPUs with different performance.
引用
收藏
页码:9739 / 9754
页数:16
相关论文
共 50 条
  • [21] Load Forecasting via Deep Neural Networks
    He, Wan
    5TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2017, 2017, 122 : 308 - 314
  • [22] Load Forecasting using Deep Neural Networks
    Hosein, Stefan
    Hosein, Patrick
    2017 IEEE POWER & ENERGY SOCIETY INNOVATIVE SMART GRID TECHNOLOGIES CONFERENCE (ISGT), 2017,
  • [23] Deep Neural Networks for Energy Load Forecasting
    Amarasinghe, Kasun
    Marino, Daniel L.
    Manic, Milos
    2017 IEEE 26TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2017, : 1483 - 1488
  • [24] Enhancement of load balancing in electrical distribution networks using artificial neural networks
    Kashem, MA
    Ganapathy, V
    Negnevitsky, M
    IPEC 2003: PROCEEDINGS OF THE 6TH INTERNATIONAL POWER ENGINEERING CONFERENCE, VOLS 1 AND 2, 2003, : 848 - 853
  • [25] Is normalization indispensable for training deep neural networks?
    Shao, Jie
    Hu, Kai
    Wang, Changhu
    Xue, Xiangyang
    Raj, Bhiksha
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [26] On Calibration of Mixup Training for Deep Neural Networks
    Maronas, Juan
    Ramos, Daniel
    Paredes, Roberto
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2020, 2021, 12644 : 67 - 76
  • [27] Exploiting Invariance in Training Deep Neural Networks
    Ye, Chengxi
    Zhou, Xiong
    McKinney, Tristan
    Liu, Yanfeng
    Zhou, Qinggang
    Zhdanov, Fedor
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8849 - 8856
  • [28] Exploring strategies for training deep neural networks
    Larochelle, Hugo
    Bengio, Yoshua
    Louradour, Jérôme
    Lamblin, Pascal
    Journal of Machine Learning Research, 2009, 10 : 1 - 40
  • [29] Training Deep Neural Networks with Gradual Deconvexification
    Lo, Jawes Ting-Ho
    Gui, Yichuan
    Peng, Yun
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1000 - 1007
  • [30] Local Critic Training of Deep Neural Networks
    Lee, Hojung
    Lee, Jong-Seok
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,