Accelerating neural network training with distributed asynchronous and selective optimization (DASO)

被引：0

作者：

Daniel Coquelin

Charlotte Debus

Markus Götz

Fabrice von der Lehr

James Kahn

Martin Siggel

Achim Streit

机构：

[1] Karlsruhe Institute of Technology,

[2] German Aerospace Center,undefined

来源：

Journal of Big Data | / 9卷

关键词：

Machine learning; Neural networks; Data parallel training; Multi-node; Multi-GPU; Stale gradients;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) and large-scale distributed resources on computer clusters. Current DPNN approaches implement the network parameter updates by synchronizing and averaging gradients across all processes with blocking communication operations after each forward-backward pass. This synchronization is the central algorithmic bottleneck. We introduce the distributed asynchronous and selective optimization (DASO) method, which leverages multi-GPU compute node architectures to accelerate network training while maintaining accuracy. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks while adjusting the global synchronization rate during the learning process. We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks, as compared to current optimized data parallel training methods.

引用

共 50 条

[41] Memory optimization at Edge for Distributed Convolution Neural Network
Naveen, Soumyalatha
Kounte, Manjunath R.
[J]. TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2022, 33 (12):
[42] Parallel Gradient-Based Local Search Accelerating Particle Swarm Optimization for Training Microwave Neural Network Models
Zhang, Jianan
Ma, Kai
Feng, Feng
Zhang, Qijun
[J]. 2015 IEEE MTT-S INTERNATIONAL MICROWAVE SYMPOSIUM (IMS), 2015,
[43] ONLINE SELECTIVE TRAINING FOR FASTER NEURAL NETWORK LEARNING
Mourad, Sara
Vikalo, Haris
Tewfik, Ahmed
[J]. 2019 IEEE DATA SCIENCE WORKSHOP (DSW), 2019, : 135 - 139
[44] Accelerating Deep Neural Network training for autonomous landing guidance via homotopy
Ni, Yang
Pan, Binfeng
Perez, Pablo Gomez
[J]. ACTA ASTRONAUTICA, 2023, 212 : 654 - 664
[45] Accelerating Large-Scale Graph Neural Network Training on Crossbar Diet
Ogbogu, Chukwufumnanya
Arka, Aqeeb Iqbal
Joardar, Biresh Kumar
Doppa, Janardhan Rao
Li, Hai
Chakrabarty, Krishnendu
Pande, Partha Pratim
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (11) : 3626 - 3637
[46] ACCELERATING RECURRENT NEURAL NETWORK TRAINING VIA TWO STAGE CLASSES AND PARALLELIZATION
Huang, Zhiheng
Zweig, Geoffrey
Levit, Michael
Dumoulin, Benoit
Oguz, Barlas
Chang, Shawn
[J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 326 - 331
[47] BulletTrain: Accelerating Robust Neural Network Training via Boundary Example Mining
Hua, Weizhe
Zhang, Yichi
Guo, Chuan
Zhang, Zhiru
Suh, G. Edward
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[48] Advances in Asynchronous Parallel and Distributed Optimization
Assran, By Mahmoud
Aytekin, Arda
Feyzmahdavian, Hamid Reza
Johansson, Mikael
Rabbat, Michael G.
[J]. PROCEEDINGS OF THE IEEE, 2020, 108 (11) : 2013 - 2031
[49] Asynchronous Distributed Optimization of Smart Grid
Ayken, Taylan
Imura, Jun-ichi
[J]. 2012 PROCEEDINGS OF SICE ANNUAL CONFERENCE (SICE), 2012, : 2098 - 2102
[50] Using distributed ledger technology to democratize neural network training
Nikolaidis, Spyridon
Refanidis, Ioannis
[J]. APPLIED INTELLIGENCE, 2021, 51 (11) : 8288 - 8304

← 1 2 3 4 5 →