A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates

被引:0
|
作者
Arjevani, Yossi [1 ]
Shamir, Ohad [2 ]
Srebro, Nathan [3 ]
机构
[1] NYU, New York, NY 10003 USA
[2] Weizmann Inst Sci, Rehovot, Israel
[3] Toyota Technol Inst, Chicago, IL USA
来源
关键词
optimization; stochastic gradient descent; delayed; asynchronous; upper bounds; lower bounds;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We establish matching upper and lower complexity bounds for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from tau rounds ago. First, we show that without stochastic noise, delays strongly affect the attainable optimization error: In fact, the error can be as bad as non-delayed gradient descent ran on only 1/tau of the gradients. In sharp contrast, we quantify how stochastic noise makes the effect of delays negligible, improving on previous work which only showed this phenomenon asymptotically or for much smaller delays. Also, in the context of distributed optimization, the results indicate that the performance of gradient descent with delays is competitive with synchronous approaches such as mini-batching. Our results are based on a novel technique for analyzing convergence of optimization algorithms using generating functions.
引用
收藏
页码:111 / 132
页数:22
相关论文
共 50 条
  • [1] On Convergence of Gradient Descent Ascent: A Tight Local Analysis
    Li, Haochuan
    Farnia, Farzan
    Das, Subhro
    Jadbabaie, Ali
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [2] Convergence analysis of gradient descent stochastic algorithms
    Shapiro, A
    Wardi, Y
    [J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1996, 91 (02) : 439 - 454
  • [3] Convergence analysis of distributed stochastic gradient descent with shuffling
    Meng, Qi
    Chen, Wei
    Wang, Yue
    Ma, Zhi-Ming
    Liu, Tie-Yan
    [J]. NEUROCOMPUTING, 2019, 337 : 46 - 57
  • [4] Stochastic gradient descent with differentially private updates
    Song, Shuang
    Chaudhuri, Kamalika
    Sarwate, Anand D.
    [J]. 2013 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2013, : 245 - 248
  • [5] Convergence of Stochastic Gradient Descent for PCA
    Shamir, Ohad
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [6] Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model
    Berthier, Raphael
    Bach, Francis
    Gaillard, Pierre
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [7] Decentralized Asynchronous Stochastic Gradient Descent: Convergence Rate Analysis
    Bedi, Amrit Singh
    Pradhan, Hrusikesha
    Rajawat, Ketan
    [J]. 2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 402 - 406
  • [8] Tight Convergence Rate of Gradient Descent for Eigenvalue Computation
    Ding, Qinghua
    Zhou, Kaiwen
    Cheng, James
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3276 - 3282
  • [9] On the convergence and improvement of stochastic normalized gradient descent
    Shen-Yi ZHAO
    Yin-Peng XIE
    Wu-Jun LI
    [J]. Science China(Information Sciences), 2021, 64 (03) : 105 - 117
  • [10] Linear Convergence of Adaptive Stochastic Gradient Descent
    Xie, Yuege
    Wu, Xiaoxia
    Ward, Rachel
    [J]. arXiv, 2019,