Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent

被引:0
|
作者
Liu, Kangqiao [1 ]
Liu Ziyin [1 ]
Ueda, Masahito [1 ,2 ,3 ]
机构
[1] Univ Tokyo, Dept Phys, Tokyo, Japan
[2] RIKEN CEMS, Wako, Saitama, Japan
[3] Univ Tokyo, Inst Phys Intelligence, Tokyo, Japan
基金
日本学术振兴会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the vanishing learning rate regime, stochastic gradient descent (SGD) is now relatively well understood. In this work, we propose to study the basic properties of SGD and its variants in the non-vanishing learning rate regime. The focus is on deriving exactly solvable results and discussing their implications. The main contributions of this work are to derive the stationary distribution for discrete-time SGD in a quadratic loss function with and without momentum; in particular, one implication of our result is that the fluctuation caused by discrete-time dynamics takes a distorted shape and is dramatically larger than a continuous-time theory could predict. Examples of applications of the proposed theory considered in this work include the approximation error of variants of SGD, the effect of minibatch noise, the optimal Bayesian inference, the escape rate from a sharp minimum, and the stationary covariance of a few second-order methods including damped Newton's method, natural gradient descent, and Adam.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Stochastic Gradient Descent with Polyak's Learning Rate
    Prazeres, Mariana
    Oberman, Adam M.
    [J]. JOURNAL OF SCIENTIFIC COMPUTING, 2021, 89 (01)
  • [2] Stochastic Gradient Descent with Polyak’s Learning Rate
    Mariana Prazeres
    Adam M. Oberman
    [J]. Journal of Scientific Computing, 2021, 89
  • [3] Convergence diagnostics for stochastic gradient descent with constant learning rate
    Chee, Jerry
    Toulis, Panos
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [4] The effective noise of stochastic gradient descent
    Mignacco, Francesca
    Urbani, Pierfrancesco
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2022, 2022 (08):
  • [5] STOCHASTIC GRADIENT DESCENT WITH FINITE SAMPLES SIZES
    Yuan, Kun
    Ying, Bicheng
    Vlaski, Stefan
    Sayed, Ali H.
    [J]. 2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2016,
  • [6] Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
    Nacson, Mor Shpigel
    Srebro, Nathan
    Soudry, Daniel
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [7] Generalization Bounds for Label Noise Stochastic Gradient Descent
    Huh, Jung Eun
    Rebeschini, Patrick
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [8] A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning
    Duk-Sun Shim
    Joseph Shim
    [J]. International Journal of Control, Automation and Systems, 2023, 21 : 3825 - 3831
  • [9] Accelerated Stochastic Gradient Descent for Minimizing Finite Sums
    Nitanda, Atsushi
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 195 - 203
  • [10] A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning
    Shim, Duk-Sun
    Shim, Joseph
    [J]. INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2023, 21 (11) : 3825 - 3831