MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent

被引:0
|
作者
Backstrom, Karl [1 ]
Papatriantafilou, Marina [1 ]
Tsigas, Philippas [1 ]
机构
[1] Chalmers Univ Technol, Dept Comp Sci & Engn, Gothenburg, Sweden
关键词
D O I
10.1109/bigdata47090.2019.9006054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic Gradient Descent (SGD) is very useful in optimization problems with high-dimensional non-convex target functions, and hence constitutes an important component of several Machine Learning and Data Analytics methods. Recently there have been significant works on understanding the parallelism inherent to SGD, and its convergence properties. Asynchronous, parallel SGD (AsyncPSGD) has received particular attention, due to observed performance benefits. On the other hand, asynchrony implies inherent challenges in understanding the execution of the algorithm and its convergence, stemming from the fact that the contribution of a thread might be based on an old (stale) view of the stale. In this work we aim to deepen the understanding of AsyncPSGD in order to increase the statistical efficiency in the presence of stale gradients. We propose new models for capturing the nature of the staleness distribution in a practical setting. Using the proposed models, we derive a staleness-adaptive SGD framework, MindTheStep-AsyncPSGD, for adapting the step size in an online-fashion, which provably reduces the negative impact of asynchrony. Moreover, we provide general convergence time bounds for a wide class of staleness-adaptive step size strategies for convex target functions. We also provide a detailed empirical study, showing how our approach implies faster convergence for deep learning applications.
引用
收藏
页码:16 / 25
页数:10
相关论文
共 50 条
  • [41] Bandwidth estimation for adaptive optical systems based on stochastic parallel gradient descent optimization
    Yu, M
    Vorontsov, MA
    ADVANCED WAVEFRONT CONTROL: METHODS, DEVICES, AND APPLICATIONS II, 2004, 5553 : 189 - 199
  • [42] Linear Convergence of Adaptive Stochastic Gradient Descent
    Xie, Yuege
    Wu, Xiaoxia
    Ward, Rachel
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [43] Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation
    Bogoychev, Nikolay
    Junczys-Dowmunt, Marcin
    Heafield, Kenneth
    Aji, Alham Fikri
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2991 - 2996
  • [44] On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants
    Reddi, Sashank J.
    Hefny, Ahmed
    Sra, Suvrit
    Poczos, Barnabas
    Smola, Alex
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [45] Decentralized Asynchronous Stochastic Gradient Descent: Convergence Rate Analysis
    Bedi, Amrit Singh
    Pradhan, Hrusikesha
    Rajawat, Ketan
    2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 402 - 406
  • [46] Fast Convergence Stochastic Parallel Gradient Descent Algorithm
    Hu Dongting
    Shen Wen
    Ma Wenchao
    Liu Xinyu
    Su Zhouping
    Zhu Huaxin
    Zhang Xiumei
    Que Lizhi
    Zhu Zhuowei
    Zhang Yixin
    Chen Guoqing
    Hu Lifa
    LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (12)
  • [47] Adaptive Beamforming Based On Stochastic Parallel Gradient Descent Algorithm For Single Receiver Phased Array
    Zhao, Haijun
    Zhang, Jing
    Yin, Zhiping
    2014 2ND INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2014, : 849 - 853
  • [48] Stochastic parallel gradient descent based adaptive optics used for a high contrast imaging coronagraph
    Dong, Bing
    Ren, De-Qing
    Zhang, Xi
    RESEARCH IN ASTRONOMY AND ASTROPHYSICS, 2011, 11 (08) : 997 - 1002
  • [50] Adaptive wavefront correction: a hybrid VLSI/optical system implementing parallel stochastic gradient descent
    Cohen, MH
    Vorontsov, M
    Carhart, G
    Cauwenberghs, G
    OPTICS IN ATMOSPHERIC PROPAGATION AND ADAPTIVE SYSTEMS III, 1999, 3866 : 176 - 182