Developing a Loss Prediction-based Asynchronous Stochastic Gradient Descent Algorithm for Distributed Training of Deep Neural Networks

被引：0

作者：

Li, Junyu ^{[1
]}

He, Ligang ^{[1
]}

Ren, Shenyuan ^{[2
]}

Mao, Rui ^{[3
]}

机构：

[1] Univ Warwick, Coventry, W Midlands, England

[2] Univ Oxford, Oxford, England

[3] Shenzhen Univ, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020 | 2020年

基金：

英国工程与自然科学研究理事会;

关键词：

Neural Networks; Distributed Training; Machine Learning;

D O I：

10.1145/3404397.3404432

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Training Deep Neural Network is a computation-intensive and time-consuming task. Asynchronous Stochastic Gradient Descent (ASGD) is an effective solution to accelerate the training process since it enables the network to be trained in a distributed fashion, but with a main issue of the delayed gradient update. A recent notable work called DC-ASGD improves the performance of ASGD by compensating the delay using a cheap approximation of the Hessian matrix. DC-ASGD works well with a short delay; however, the performance drops considerably with an increasing delay between the workers and the server. In real-life large-scale distributed training, such gradient delay experienced by the worker is usually high and volatile. In this paper, we propose a novel algorithm called LC-ASGD to compensate for the delay, basing on Loss Prediction. It effectively extends the tolerable delay duration for the compensation mechanism. Specifically, LC-ASGD utilizes additional models that reside in the parameter server and predict the loss to compensate for the delay, basing on historical losses collected from each worker. The algorithm is evaluated on the popular networks and benchmark datasets. The experimental results show that our LC-ASGD significantly improves over existing methods, especially when the networks are trained with a large number of workers.

引用

页数：10

共 50 条

[1] An asynchronous distributed training algorithm based on Gossip communication and Stochastic Gradient Descent
Tu, Jun
Zhou, Jia
Ren, Donglin
[J]. COMPUTER COMMUNICATIONS, 2022, 195 : 416 - 423
[2] Improving Training Time of Deep Neural Network With Asynchronous Averaged Stochastic Gradient Descent
You, Zhao
Xu, Bo
[J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 446 - 449
[3] Non-convergence of stochastic gradient descent in the training of deep neural networks
Cheridito, Patrick
Jentzen, Arnulf
Rossmannek, Florian
[J]. JOURNAL OF COMPLEXITY, 2021, 64
[4] DAC-SGD: A Distributed Stochastic Gradient Descent Algorithm Based on Asynchronous Connection
He, Aijia
Chen, Zehong
Li, Weichen
Li, Xingying
Li, Hongjun
Zhao, Xin
[J]. IIP'17: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING, 2017,
[5] Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
Vasudevan, Shrihari
[J]. ENTROPY, 2020, 22 (05)
[6] Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
Cui, Xiaodong
Zhang, Wei
Tuske, Zoltan
Picheny, Michael
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[7] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
Teng, Yunfei
Gao, Wenbo
Chalus, Francois
Choromanska, Anna
Goldfarb, Donald
Weller, Adrian
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[8] Explicit loss asymptotics in the gradient descent training of neural networks
Velikanov, Maksim
Yarotsky, Dmitry
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[9] Gradient Descent Analysis: On Visualizing the Training of Deep Neural Networks
Becker, Martin
Lippel, Jens
Zielke, Thomas
[J]. PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL 3: IVAPP, 2019, : 338 - 345
[10] Distributed stochastic gradient descent for link prediction in signed social networks
Zhang, Han
Wu, Gang
Ling, Qing
[J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2019, 2019 (1)

← 1 2 3 4 5 →