Optimized convergence of stochastic gradient descent by weighted averaging

被引：0

作者：

Hagedorn, Melinda ^{[1
]}

Jarre, Florian ^{[1
]}

机构：

[1] Heinrich Heine Univ, Math Nat Fak, Dusseldorf, Germany

来源：

OPTIMIZATION METHODS & SOFTWARE | 2024年

关键词：

Convex optimization; stochastic gradient descent; weighted averaging; noise; optimal step lengths; optimal weights;

D O I：

10.1080/10556788.2024.2306383

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the arithmetic mean of all iterates converges considerably slower to the optimal solution than the iterates themselves. And also in the presence of noise, when a termination of the stochastic gradient method after a finite number of steps is considered, the arithmetic mean is not necessarily the best possible approximation to the unknown optimal solution. This paper aims at identifying optimal strategies in a particularly simple case, the minimization of a strongly convex function with i. i. d. noise terms and termination after a finite number of steps. Explicit formulas for the stochastic error and the optimality error are derived in dependence of certain parameters of the SGD method. The aim was to choose parameters such that both stochastic error and optimality error are reduced compared to arithmetic averaging. This aim could not be achieved; however, by allowing a slight increase of the stochastic error it was possible to select the parameters such that a significant reduction of the optimality error could be achieved. This reduction of the optimality error has a strong effect on the approximate solution generated by the stochastic gradient method in case that only a moderate number of iterations is used or when the initial error is large. The numerical examples confirm the theoretical results and suggest that a generalization to non-quadratic objective functions may be possible.

引用

页数：26

共 50 条

[1] On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression
Cohen, Kobi
Nedic, Angelia
Srikant, R.
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (11) : 5974 - 5981
[2] On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression
Cohen, Kobi
Nedic, Angelia
Srikant, R.
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2314 - 2318
[3] Convergence of Stochastic Gradient Descent for PCA
Shamir, Ohad
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[4] On the Convergence Properties of a K-Step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization
Zhou, Fan
Cong, Guojing
[J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3219 - 3227
[5] On the convergence and improvement of stochastic normalized gradient descent
Shen-Yi ZHAO
Yin-Peng XIE
Wu-Jun LI
[J]. Science China(Information Sciences), 2021, 64 (03) : 105 - 117
[6] Linear Convergence of Adaptive Stochastic Gradient Descent
Xie, Yuege
Wu, Xiaoxia
Ward, Rachel
[J]. arXiv, 2019,
[7] On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
Li, Xiaoyu
Orabona, Francesco
[J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[8] Convergence analysis of gradient descent stochastic algorithms
Shapiro, A
Wardi, Y
[J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1996, 91 (02) : 439 - 454
[9] Global Convergence and Stability of Stochastic Gradient Descent
Patel, Vivak
Zhang, Shushu
Tian, Bowen
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[10] Batched Stochastic Gradient Descent with Weighted Sampling
Needell, Deanna
Ward, Rachel
[J]. APPROXIMATION THEORY XV, 2017, 201 : 279 - 306

← 1 2 3 4 5 →