Developing a Loss Prediction-based Asynchronous Stochastic Gradient Descent Algorithm for Distributed Training of Deep Neural Networks

被引:0
|
作者
Li, Junyu [1 ]
He, Ligang [1 ]
Ren, Shenyuan [2 ]
Mao, Rui [3 ]
机构
[1] Univ Warwick, Coventry, W Midlands, England
[2] Univ Oxford, Oxford, England
[3] Shenzhen Univ, Shenzhen, Peoples R China
基金
英国工程与自然科学研究理事会;
关键词
Neural Networks; Distributed Training; Machine Learning;
D O I
10.1145/3404397.3404432
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Training Deep Neural Network is a computation-intensive and time-consuming task. Asynchronous Stochastic Gradient Descent (ASGD) is an effective solution to accelerate the training process since it enables the network to be trained in a distributed fashion, but with a main issue of the delayed gradient update. A recent notable work called DC-ASGD improves the performance of ASGD by compensating the delay using a cheap approximation of the Hessian matrix. DC-ASGD works well with a short delay; however, the performance drops considerably with an increasing delay between the workers and the server. In real-life large-scale distributed training, such gradient delay experienced by the worker is usually high and volatile. In this paper, we propose a novel algorithm called LC-ASGD to compensate for the delay, basing on Loss Prediction. It effectively extends the tolerable delay duration for the compensation mechanism. Specifically, LC-ASGD utilizes additional models that reside in the parameter server and predict the loss to compensate for the delay, basing on historical losses collected from each worker. The algorithm is evaluated on the popular networks and benchmark datasets. The experimental results show that our LC-ASGD significantly improves over existing methods, especially when the networks are trained with a large number of workers.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] An asynchronous distributed training algorithm based on Gossip communication and Stochastic Gradient Descent
    Tu, Jun
    Zhou, Jia
    Ren, Donglin
    [J]. COMPUTER COMMUNICATIONS, 2022, 195 : 416 - 423
  • [2] Improving Training Time of Deep Neural Network With Asynchronous Averaged Stochastic Gradient Descent
    You, Zhao
    Xu, Bo
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 446 - 449
  • [3] Non-convergence of stochastic gradient descent in the training of deep neural networks
    Cheridito, Patrick
    Jentzen, Arnulf
    Rossmannek, Florian
    [J]. JOURNAL OF COMPLEXITY, 2021, 64
  • [4] DAC-SGD: A Distributed Stochastic Gradient Descent Algorithm Based on Asynchronous Connection
    He, Aijia
    Chen, Zehong
    Li, Weichen
    Li, Xingying
    Li, Hongjun
    Zhao, Xin
    [J]. IIP'17: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING, 2017,
  • [5] Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
    Vasudevan, Shrihari
    [J]. ENTROPY, 2020, 22 (05)
  • [6] Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
    Cui, Xiaodong
    Zhang, Wei
    Tuske, Zoltan
    Picheny, Michael
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [7] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
    Teng, Yunfei
    Gao, Wenbo
    Chalus, Francois
    Choromanska, Anna
    Goldfarb, Donald
    Weller, Adrian
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [8] Explicit loss asymptotics in the gradient descent training of neural networks
    Velikanov, Maksim
    Yarotsky, Dmitry
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] Gradient Descent Analysis: On Visualizing the Training of Deep Neural Networks
    Becker, Martin
    Lippel, Jens
    Zielke, Thomas
    [J]. PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL 3: IVAPP, 2019, : 338 - 345
  • [10] Distributed stochastic gradient descent for link prediction in signed social networks
    Zhang, Han
    Wu, Gang
    Ling, Qing
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2019, 2019 (1)