On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

被引:0
|
作者
Li, Xiaoyu [1 ]
Orabona, Francesco [1 ]
机构
[1] Boston Univ, Boston, MA 02215 USA
基金
美国国家科学基金会;
关键词
SUBGRADIENT METHODS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic gradient descent is the method of choice for large scale optimization of machine learning objective functions. Yet, its performance is greatly variable and heavily depends on the choice of the stepsizes. This has motivated a large body of research on adaptive stepsizes. However, there is currently a gap in our theoretical understanding of these methods, especially in the non-convex setting. In this paper, we start closing this gap: we theoretically analyze in the convex and non-convex settings a generalized version of the AdaGrad stepsizes. We show sufficient conditions for these stepsizes to achieve almost sure asymptotic convergence of the gradients to zero, proving the first guarantee for generalized AdaGrad stepsizes in the non-convex setting. Moreover, we show that these stepsizes allow to automatically adapt to the level of noise of the stochastic gradients in both the convex and non-convex settings, interpolating between O(1/T) and O(1/root T), up to logarithmic terms.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Linear Convergence of Adaptive Stochastic Gradient Descent
    Xie, Yuege
    Wu, Xiaoxia
    Ward, Rachel
    [J]. arXiv, 2019,
  • [2] Linear Convergence of Adaptive Stochastic Gradient Descent
    Xie, Yuege
    Wu, Xiaoxia
    Ward, Rachel
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [3] Convergence of Stochastic Gradient Descent for PCA
    Shamir, Ohad
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [4] On the convergence and improvement of stochastic normalized gradient descent
    Shen-Yi ZHAO
    Yin-Peng XIE
    Wu-Jun LI
    [J]. Science China(Information Sciences), 2021, 64 (03) : 105 - 117
  • [5] Convergence analysis of gradient descent stochastic algorithms
    Shapiro, A
    Wardi, Y
    [J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1996, 91 (02) : 439 - 454
  • [6] Global Convergence and Stability of Stochastic Gradient Descent
    Patel, Vivak
    Zhang, Shushu
    Tian, Bowen
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [7] On the convergence and improvement of stochastic normalized gradient descent
    Shen-Yi Zhao
    Yin-Peng Xie
    Wu-Jun Li
    [J]. Science China Information Sciences, 2021, 64
  • [8] On the convergence and improvement of stochastic normalized gradient descent
    Zhao, Shen-Yi
    Xie, Yin-Peng
    Li, Wu-Jun
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (03)
  • [9] Convergence analysis of distributed stochastic gradient descent with shuffling
    Meng, Qi
    Chen, Wei
    Wang, Yue
    Ma, Zhi-Ming
    Liu, Tie-Yan
    [J]. NEUROCOMPUTING, 2019, 337 : 46 - 57
  • [10] Optimized convergence of stochastic gradient descent by weighted averaging
    Hagedorn, Melinda
    Jarre, Florian
    [J]. OPTIMIZATION METHODS & SOFTWARE, 2024,