Training deep neural networks: a static load balancing approach

被引:11
|
作者
Moreno-Alvarez, Sergio [1 ]
Haut, Juan M. [2 ]
Paoletti, Mercedes E. [2 ]
Rico-Gallego, Juan A. [1 ]
Diaz-Martin, Juan C. [2 ]
Plaza, Javier [2 ]
机构
[1] Univ Extremadura, Dept Comp Syst Engn & Telemat, Caceres, Spain
[2] Univ Extremadura, Dept Technol Comp & Commun, Caceres, Spain
来源
JOURNAL OF SUPERCOMPUTING | 2020年 / 76卷 / 12期
关键词
Deep learning; High-performance computing; Distributed training; Heterogeneous platforms;
D O I
10.1007/s11227-020-03200-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks are currently trained under data-parallel setups on high-performance computing (HPC) platforms, so that a replica of the full model is charged to each computational resource using non-overlapped subsets known as batches. Replicas combine the computed gradients to update their local copies at the end of each batch. However, differences in performance of resources assigned to replicas in current heterogeneous platforms induce waiting times when synchronously combining gradients, leading to an overall performance degradation. Albeit asynchronous communication of gradients has been proposed as an alternative, it suffers from the so-called staleness problem. This is due to the fact that the training in each replica is computed using a stale version of the parameters, which negatively impacts the accuracy of the resulting model. In this work, we study the application of well-known HPC static load balancing techniques to the distributed training of deep models. Our approach is assigning a different batch size to each replica, proportional to its relative computing capacity, hence minimizing the staleness problem. Our experimental results (obtained in the context of a remotely sensed hyperspectral image processing application) show that, while the classification accuracy is kept constant, the training time substantially decreases with respect to unbalanced training. This is illustrated using heterogeneous computing platforms, made up of CPUs and GPUs with different performance.
引用
收藏
页码:9739 / 9754
页数:16
相关论文
共 50 条
  • [1] Training deep neural networks: a static load balancing approach
    Sergio Moreno-Álvarez
    Juan M. Haut
    Mercedes E. Paoletti
    Juan A. Rico-Gallego
    Juan C. Díaz-Martín
    Javier Plaza
    The Journal of Supercomputing, 2020, 76 : 9739 - 9754
  • [2] ON TRAINING DEEP NEURAL NETWORKS USING A STREAMING APPROACH
    Duda, Piotr
    Jaworski, Maciej
    Cader, Andrzej
    Wang, Lipo
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2020, 10 (01) : 15 - 26
  • [3] A Deep Reinforcement Learning Approach for Load Balancing In Open Radio Access Networks
    Zafar, Hammad
    Kasparick, Martin
    Maghsudi, Setareh
    Stanczak, Slawomir
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 3723 - 3728
  • [4] Load Balancing for Ultradense Networks: A Deep Reinforcement Learning-Based Approach
    Xu, Yue
    Xu, Wenjun
    Wang, Zhi
    Lin, Jiaru
    Cui, Shuguang
    IEEE INTERNET OF THINGS JOURNAL, 2019, 6 (06): : 9399 - 9412
  • [5] A Gradient Boosting Approach for Training Convolutional and Deep Neural Networks
    Emami, Seyedsaman
    Martinez-Munoz, Gonzalo
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2023, 4 : 313 - 321
  • [6] DLB: A Dynamic Load Balance Strategy for Distributed Training of Deep Neural Networks
    Ye, Qing
    Zhou, Yuhao
    Shi, Mingjia
    Sun, Yanan
    Lv, Jiancheng
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (04): : 1217 - 1227
  • [7] Quasi-static load balancing in local area networks
    Krajewski, M
    Hofmann, U
    KUWAIT JOURNAL OF SCIENCE & ENGINEERING, 1996, 23 (02): : 181 - 197
  • [8] Load Balancing Using Neural Networks Approach for Assisted Content Delivery in Heterogeneous Network
    Sakat, Raid
    Saadoon, Raed
    Abbod, Maysam
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, 2020, 1038 : 533 - 547
  • [10] A Gradient-Guided Evolutionary Approach to Training Deep Neural Networks
    Yang, Shangshang
    Tian, Ye
    He, Cheng
    Zhang, Xingyi
    Tan, Kay Chen
    Jin, Yaochu
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (09) : 4861 - 4875