On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

被引：0

作者：

Li, Xiaoyu ^{[1
]}

Orabona, Francesco ^{[1
]}

机构：

[1] Boston Univ, Boston, MA 02215 USA

来源：

22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89 | 2019年 / 89卷

基金：

美国国家科学基金会;

关键词：

SUBGRADIENT METHODS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Stochastic gradient descent is the method of choice for large scale optimization of machine learning objective functions. Yet, its performance is greatly variable and heavily depends on the choice of the stepsizes. This has motivated a large body of research on adaptive stepsizes. However, there is currently a gap in our theoretical understanding of these methods, especially in the non-convex setting. In this paper, we start closing this gap: we theoretically analyze in the convex and non-convex settings a generalized version of the AdaGrad stepsizes. We show sufficient conditions for these stepsizes to achieve almost sure asymptotic convergence of the gradients to zero, proving the first guarantee for generalized AdaGrad stepsizes in the non-convex setting. Moreover, we show that these stepsizes allow to automatically adapt to the level of noise of the stochastic gradients in both the convex and non-convex settings, interpolating between O(1/T) and O(1/root T), up to logarithmic terms.

引用

页数：10

共 50 条

[1] Linear Convergence of Adaptive Stochastic Gradient Descent
Xie, Yuege
Wu, Xiaoxia
Ward, Rachel
[J]. arXiv, 2019,
[2] Linear Convergence of Adaptive Stochastic Gradient Descent
Xie, Yuege
Wu, Xiaoxia
Ward, Rachel
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[3] Convergence of Stochastic Gradient Descent for PCA
Shamir, Ohad
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[4] On the convergence and improvement of stochastic normalized gradient descent
Shen-Yi ZHAO
Yin-Peng XIE
Wu-Jun LI
[J]. Science China(Information Sciences), 2021, 64 (03) : 105 - 117
[5] Convergence analysis of gradient descent stochastic algorithms
Shapiro, A
Wardi, Y
[J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1996, 91 (02) : 439 - 454
[6] Global Convergence and Stability of Stochastic Gradient Descent
Patel, Vivak
Zhang, Shushu
Tian, Bowen
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[7] On the convergence and improvement of stochastic normalized gradient descent
Shen-Yi Zhao
Yin-Peng Xie
Wu-Jun Li
[J]. Science China Information Sciences, 2021, 64
[8] On the convergence and improvement of stochastic normalized gradient descent
Zhao, Shen-Yi
Xie, Yin-Peng
Li, Wu-Jun
[J]. SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (03)
[9] Convergence analysis of distributed stochastic gradient descent with shuffling
Meng, Qi
Chen, Wei
Wang, Yue
Ma, Zhi-Ming
Liu, Tie-Yan
[J]. NEUROCOMPUTING, 2019, 337 : 46 - 57
[10] Optimized convergence of stochastic gradient descent by weighted averaging
Hagedorn, Melinda
Jarre, Florian
[J]. OPTIMIZATION METHODS & SOFTWARE, 2024,

← 1 2 3 4 5 →