Training deep neural networks: a static load balancing approach

被引:11
|
作者
Moreno-Alvarez, Sergio [1 ]
Haut, Juan M. [2 ]
Paoletti, Mercedes E. [2 ]
Rico-Gallego, Juan A. [1 ]
Diaz-Martin, Juan C. [2 ]
Plaza, Javier [2 ]
机构
[1] Univ Extremadura, Dept Comp Syst Engn & Telemat, Caceres, Spain
[2] Univ Extremadura, Dept Technol Comp & Commun, Caceres, Spain
来源
JOURNAL OF SUPERCOMPUTING | 2020年 / 76卷 / 12期
关键词
Deep learning; High-performance computing; Distributed training; Heterogeneous platforms;
D O I
10.1007/s11227-020-03200-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks are currently trained under data-parallel setups on high-performance computing (HPC) platforms, so that a replica of the full model is charged to each computational resource using non-overlapped subsets known as batches. Replicas combine the computed gradients to update their local copies at the end of each batch. However, differences in performance of resources assigned to replicas in current heterogeneous platforms induce waiting times when synchronously combining gradients, leading to an overall performance degradation. Albeit asynchronous communication of gradients has been proposed as an alternative, it suffers from the so-called staleness problem. This is due to the fact that the training in each replica is computed using a stale version of the parameters, which negatively impacts the accuracy of the resulting model. In this work, we study the application of well-known HPC static load balancing techniques to the distributed training of deep models. Our approach is assigning a different batch size to each replica, proportional to its relative computing capacity, hence minimizing the staleness problem. Our experimental results (obtained in the context of a remotely sensed hyperspectral image processing application) show that, while the classification accuracy is kept constant, the training time substantially decreases with respect to unbalanced training. This is illustrated using heterogeneous computing platforms, made up of CPUs and GPUs with different performance.
引用
收藏
页码:9739 / 9754
页数:16
相关论文
共 50 条
  • [31] Training Deep Neural Networks for Visual Servoing
    Bateux, Quentin
    Marchand, Eric
    Leitner, Jurgen
    Chaumette, Francois
    Corke, Peter
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 3307 - 3314
  • [32] The Impact of Architecture on the Deep Neural Networks Training
    Rozycki, Pawel
    Kolbusz, Janusz
    Malinowski, Aleksander
    Wilamowski, Bogdan
    2019 12TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI), 2019, : 41 - 46
  • [33] An Optimization Strategy for Deep Neural Networks Training
    Wu, Tingting
    Zeng, Peng
    Song, Chunhe
    2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 596 - 603
  • [34] Exploring Strategies for Training Deep Neural Networks
    Larochelle, Hugo
    Bengio, Yoshua
    Louradour, Jerome
    Lamblin, Pascal
    JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 1 - 40
  • [35] DANTE: Deep alternations for training neural networks
    Sinha, Vaibhav B.
    Kudugunta, Sneha
    Sankar, Adepu Ravi
    Chavali, Surya Teja
    Balasubramanian, Vineeth N.
    NEURAL NETWORKS, 2020, 131 : 127 - 143
  • [36] HEATING LOAD PREDICTIONS USING THE STATIC NEURAL NETWORKS METHOD
    Sholahudin, S.
    Han, Hwataik
    INTERNATIONAL JOURNAL OF TECHNOLOGY, 2015, 6 (06) : 946 - 953
  • [37] The Simulation of Static Load Balancing Algorithms
    Rahmawan, Hendra
    Gondokaryono, Yudi Satria
    2009 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS, VOLS 1 AND 2, 2009, : 622 - 627
  • [38] Deep Energy: Task Driven Training of Deep Neural Networks
    Golts, Alona
    Freedman, Daniel
    Elad, Michael
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2021, 15 (02) : 324 - 338
  • [39] A Load Balancing Uneven Clustering approach for wireless sensor networks
    Jia, Y. L.
    Zhang, C. Y.
    PROCEEDINGS OF THE 2016 5TH INTERNATIONAL CONFERENCE ON MEASUREMENT, INSTRUMENTATION AND AUTOMATION (ICMIA 2016), 2016, 138 : 307 - 311
  • [40] Firework inspired load balancing approach for wireless sensor networks
    Ravi Kumar Prasad
    Santanoo Madhu
    Prashant Ramotra
    Damodar Reddy Edla
    Wireless Networks, 2021, 27 : 4111 - 4122