Linear Convergence of Adaptive Stochastic Gradient Descent

被引:0
|
作者
Xie, Yuege [1 ]
Wu, Xiaoxia [2 ]
Ward, Rachel [1 ,2 ]
机构
[1] UT Austin, Oden Inst, Austin, TX 78712 USA
[2] UT Austin, Dept Math, Austin, TX USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We prove that the norm version of the adaptive stochastic gradient method (AdaGradNorm) achieves a linear convergence rate for a subset of either strongly convex functions or non-convex functions that satisfy the Polyak-Lojasiewicz (PL) inequality. The paper introduces the notion of Restricted Uniform Inequality of Gradients (RUIG)-which is a measure of the balanced-ness of the stochastic gradient norms-to depict the landscape of a function. RUIG plays a key role in proving the robustness of AdaGrad-Norm to its hyper-parameter tuning in the stochastic setting. On top of RUIG, we develop a two-stage framework to prove the linear convergence of AdaGrad-Norm without knowing the parameters of the objective functions. This framework can likely be extended to other adaptive stepsize algorithms. The numerical experiments validate the theory and suggest future directions for improvement.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Linear Convergence of Adaptive Stochastic Gradient Descent
    Xie, Yuege
    Wu, Xiaoxia
    Ward, Rachel
    [J]. arXiv, 2019,
  • [2] On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
    Li, Xiaoyu
    Orabona, Francesco
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [3] Convergence of Stochastic Gradient Descent for PCA
    Shamir, Ohad
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [4] On the Convergence of Stochastic Gradient Descent for Linear Inverse Problems in Banach Spaces
    Jin, Bangti
    Kereta, Zeljko
    [J]. SIAM JOURNAL ON IMAGING SCIENCES, 2023, 16 (02): : 671 - 705
  • [5] On the convergence and improvement of stochastic normalized gradient descent
    Shen-Yi ZHAO
    Yin-Peng XIE
    Wu-Jun LI
    [J]. Science China(Information Sciences), 2021, 64 (03) : 105 - 117
  • [6] Convergence analysis of gradient descent stochastic algorithms
    Shapiro, A
    Wardi, Y
    [J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1996, 91 (02) : 439 - 454
  • [7] Global Convergence and Stability of Stochastic Gradient Descent
    Patel, Vivak
    Zhang, Shushu
    Tian, Bowen
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [8] On the convergence and improvement of stochastic normalized gradient descent
    Shen-Yi Zhao
    Yin-Peng Xie
    Wu-Jun Li
    [J]. Science China Information Sciences, 2021, 64
  • [9] On the convergence and improvement of stochastic normalized gradient descent
    Zhao, Shen-Yi
    Xie, Yin-Peng
    Li, Wu-Jun
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (03)
  • [10] Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model
    Berthier, Raphael
    Bach, Francis
    Gaillard, Pierre
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33