Accelerating neural network training with distributed asynchronous and selective optimization (DASO)

被引：4

作者：

Coquelin, Daniel ^{[1
]}

Debus, Charlotte ^{[1
]}

Goetz, Markus ^{[1
]}

von der Lehr, Fabrice ^{[2
]}

Kahn, James ^{[1
]}

Siggel, Martin ^{[2
]}

Streit, Achim ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, Hermann Von Helmholtz Pl 1, D-76344 Eggenstein Leopoldshafen, Germany

[2] German Aerosp Ctr, D-51147 Cologne, Germany

来源：

JOURNAL OF BIG DATA | 2022年 / 9卷 / 01期

关键词：

Machine learning; Neural networks; Data parallel training; Multi-node; Multi-GPU; Stale gradients;

D O I：

10.1186/s40537-021-00556-1

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) and large-scale distributed resources on computer clusters. Current DPNN approaches implement the network parameter updates by synchronizing and averaging gradients across all processes with blocking communication operations after each forward-backward pass. This synchronization is the central algorithmic bottleneck. We introduce the distributed asynchronous and selective optimization (DASO) method, which leverages multi-GPU compute node architectures to accelerate network training while maintaining accuracy. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks while adjusting the global synchronization rate during the learning process. We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks, as compared to current optimized data parallel training methods.

引用

页数：18

共 50 条

[21] A Distributed Particle Swarm Optimization Algorithm Based on Apache Spark for Asynchronous Parallel Training of Deep Neural Networks
Capel, Manuel, I
Holgado-Terriza, Juan A.
Galiana-Velasco, Sergio
Salguero, Alberto G.
[J]. 53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 76 - 85
[22] Global optimization for neural network training
Shang, Y
Wah, BW
[J]. COMPUTER, 1996, 29 (03) : 45 - +
[23] Distributed Graph Neural Network Training: A Survey
Shao, Yingxia
Li, Hongzheng
Gu, Xizhi
Yin, Hongbo
Li, Yawen
Miao, Xupeng
Zhang, Wentao
Cui, Bin
Chen, Lei
[J]. ACM COMPUTING SURVEYS, 2024, 56 (08)
[24] NeuralGenesis: a software for distributed neural network training
Tsoulos, Ioannis
Tzallas, Alexandros T.
Tsalikakis, Dimitrios G.
Giannakeas, Nikolaos
Tsipouras, Markos G.
Androulidakis, Iosif
Zaitseva, Elena
[J]. 2016 24TH TELECOMMUNICATIONS FORUM (TELFOR), 2016, : 841 - 844
[25] GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training
Cai, Tianle
Luo, Shengjie
Xu, Keyulu
He, Di
Liu, Tie-Yan
Wang, Liwei
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[26] fuseGNN: Accelerating Graph Convolutional Neural Network Training on GPGPU
Chen, Zhaodong
Yan, Mingyu
Zhu, Maohua
Deng, Lei
Li, Guoqi
Li, Shuangchen
Xie, Yuan
[J]. 2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
[27] Accelerating Neural Network Training with Processing-in-Memory GPU
Fei, Xiang
Han, Jianhui
Huang, Jianqiang
Zheng, Weimin
Zhang, Youhui
[J]. 2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 414 - 421
[28] An Artificial Neural Network for Distributed Constrained Optimization
Liu, Na
Jia, Wenwen
Qin, Sitian
Li, Guocheng
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 430 - 441
[29] Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees
Kungurtsev, Vyacheslav
Egan, Malcolm
Chatterjee, Bapi
Alistarh, Dan
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8209 - 8216
[30] Accelerating deep neural network training for action recognition on a cluster of GPUs
Cong, Guojing
Domeniconi, Giacomo
Shapiro, Joshua
Zhou, Fan
Chen, Barry
[J]. 2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 298 - 305

← 1 2 3 4 5 →