Optimized convergence of stochastic gradient descent by weighted averaging

被引:0
|
作者
Hagedorn, Melinda [1 ]
Jarre, Florian [1 ]
机构
[1] Heinrich Heine Univ, Math Nat Fak, Dusseldorf, Germany
关键词
Convex optimization; stochastic gradient descent; weighted averaging; noise; optimal step lengths; optimal weights;
D O I
10.1080/10556788.2024.2306383
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the arithmetic mean of all iterates converges considerably slower to the optimal solution than the iterates themselves. And also in the presence of noise, when a termination of the stochastic gradient method after a finite number of steps is considered, the arithmetic mean is not necessarily the best possible approximation to the unknown optimal solution. This paper aims at identifying optimal strategies in a particularly simple case, the minimization of a strongly convex function with i. i. d. noise terms and termination after a finite number of steps. Explicit formulas for the stochastic error and the optimality error are derived in dependence of certain parameters of the SGD method. The aim was to choose parameters such that both stochastic error and optimality error are reduced compared to arithmetic averaging. This aim could not be achieved; however, by allowing a slight increase of the stochastic error it was possible to select the parameters such that a significant reduction of the optimality error could be achieved. This reduction of the optimality error has a strong effect on the approximate solution generated by the stochastic gradient method in case that only a moderate number of iterations is used or when the initial error is large. The numerical examples confirm the theoretical results and suggest that a generalization to non-quadratic objective functions may be possible.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression
    Cohen, Kobi
    Nedic, Angelia
    Srikant, R.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (11) : 5974 - 5981
  • [2] On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression
    Cohen, Kobi
    Nedic, Angelia
    Srikant, R.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2314 - 2318
  • [3] Convergence of Stochastic Gradient Descent for PCA
    Shamir, Ohad
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [4] On the Convergence Properties of a K-Step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization
    Zhou, Fan
    Cong, Guojing
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3219 - 3227
  • [5] On the convergence and improvement of stochastic normalized gradient descent
    Shen-Yi ZHAO
    Yin-Peng XIE
    Wu-Jun LI
    [J]. Science China(Information Sciences), 2021, 64 (03) : 105 - 117
  • [6] Linear Convergence of Adaptive Stochastic Gradient Descent
    Xie, Yuege
    Wu, Xiaoxia
    Ward, Rachel
    [J]. arXiv, 2019,
  • [7] On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
    Li, Xiaoyu
    Orabona, Francesco
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [8] Convergence analysis of gradient descent stochastic algorithms
    Shapiro, A
    Wardi, Y
    [J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1996, 91 (02) : 439 - 454
  • [9] Global Convergence and Stability of Stochastic Gradient Descent
    Patel, Vivak
    Zhang, Shushu
    Tian, Bowen
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [10] Batched Stochastic Gradient Descent with Weighted Sampling
    Needell, Deanna
    Ward, Rachel
    [J]. APPROXIMATION THEORY XV, 2017, 201 : 279 - 306