Accelerating neural network training with distributed asynchronous and selective optimization (DASO)

被引：4

作者：

Coquelin, Daniel ^{[1
]}

Debus, Charlotte ^{[1
]}

Goetz, Markus ^{[1
]}

von der Lehr, Fabrice ^{[2
]}

Kahn, James ^{[1
]}

Siggel, Martin ^{[2
]}

Streit, Achim ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, Hermann Von Helmholtz Pl 1, D-76344 Eggenstein Leopoldshafen, Germany

[2] German Aerosp Ctr, D-51147 Cologne, Germany

来源：

JOURNAL OF BIG DATA | 2022年 / 9卷 / 01期

关键词：

Machine learning; Neural networks; Data parallel training; Multi-node; Multi-GPU; Stale gradients;

D O I：

10.1186/s40537-021-00556-1

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) and large-scale distributed resources on computer clusters. Current DPNN approaches implement the network parameter updates by synchronizing and averaging gradients across all processes with blocking communication operations after each forward-backward pass. This synchronization is the central algorithmic bottleneck. We introduce the distributed asynchronous and selective optimization (DASO) method, which leverages multi-GPU compute node architectures to accelerate network training while maintaining accuracy. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks while adjusting the global synchronization rate during the learning process. We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks, as compared to current optimized data parallel training methods.

引用

页数：18

共 50 条

[31] Distributed Deep Neural Network Training on Edge Devices
Benditkis, Daniel
Keren, Aviv
Mor-Yosef, Liron
Avidor, Tomer
Shoham, Neta
Tal-Israel, Nadav
[J]. SEC'19: PROCEEDINGS OF THE 4TH ACM/IEEE SYMPOSIUM ON EDGE COMPUTING, 2019, : 304 - 306
[32] Accelerating convolutional neural network training using ProMoD backpropagation algorithm
Gurhanli, Ahmet
[J]. IET IMAGE PROCESSING, 2020, 14 (13) : 2957 - 2964
[33] Accelerating deep neural network training with inconsistent stochastic gradient descent
Wang, Linnan
Yang, Yi
Min, Renqiang
Chakradhar, Srimat
[J]. NEURAL NETWORKS, 2017, 93 : 219 - 229
[34] Neural Network Training Schemes for Antenna Optimization
Linh Ho Manh
Grimaccia, Francesco
Mussetta, Marco
Zich, Riccardo E.
[J]. 2014 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM (APSURSI), 2014, : 1948 - 1949
[35] Graph Attention Neural Network Distributed Model Training
Esmaeilzadeh, Armin
Kambar, Mina Esmail Zadeh Nojoo
Heidari, Maryam
[J]. 2022 IEEE WORLD AI IOT CONGRESS (AIIOT), 2022, : 447 - 452
[36] Neural network training and stochastic global optimization
Jordanov, I
[J]. ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING: COMPUTATIONAL INTELLIGENCE FOR THE E-AGE, 2002, : 488 - 492
[37] Instance Selection Optimization for Neural Network Training
Kordos, Miroslaw
[J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2016, 2016, 9692 : 610 - 620
[38] Geryon: Accelerating Distributed CNN Training by Network-Level Flow Scheduling
Wang, Shuai
Li, Dan
Geng, Jinkun
[J]. IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2020, : 1678 - 1687
[39] Accelerating Distributed GNN Training by Codes
Wang, Yanhong
Guan, Tianchan
Niu, Dimin
Zou, Qiaosha
Zheng, Hongzhong
Shi, C. -J. Richard
Xie, Yuan
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (09) : 2598 - 2614
[40] Accelerating aerodynamic design optimization based on graph convolutional neural network
Li, Tiejun
Yan, Junjun
Chen, Xinhai
Wang, Zhichao
Zhang, Qingyang
Zhou, Enqiang
Gong, Chunye
Liu, Jie
[J]. INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2024, 35 (01):

← 1 2 3 4 5 →