Preconditioned Stochastic Gradient Descent

被引:57
|
作者
Li, Xi-Lin [1 ,2 ,3 ]
机构
[1] Univ Maryland Baltimore Cty, Machine Learning Signal Proc Lab, Baltimore, MD 21228 USA
[2] Fortemedia Inc, Santa Clara, CA USA
[3] Cisco Syst Inc, San Jose, CA USA
关键词
Neural network; Newton method; nonconvex optimization; preconditioner; stochastic gradient descent (SGD);
D O I
10.1109/TNNLS.2017.2672978
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic gradient descent (SGD) still is the workhorse for many practical problems. However, it converges slow, and can be difficult to tune. It is possible to precondition SGD to accelerate its convergence remarkably. But many attempts in this direction either aim at solving specialized problems, or result in significantly more complicated methods than SGD. This paper proposes a new method to adaptively estimate a preconditioner, such that the amplitudes of perturbations of preconditioned stochastic gradient match that of the perturbations of parameters to be optimized in a way comparable to Newton method for deterministic optimization. Unlike the preconditioners based on secant equation fitting as done in deterministic quasi-Newton methods, which assume positive definite Hessian and approximate its inverse, the new preconditioner works equally well for both convex and nonconvex optimizations with exact or noisy gradients. When stochastic gradient is used, it can naturally damp the gradient noise to stabilize SGD. Efficient preconditioner estimation methods are developed, and with reasonable simplifications, they are applicable to large-scale problems. Experimental results demonstrate that equipped with the new preconditioner, without any tuning effort, preconditioned SGD can efficiently solve many challenging problems like the training of a deep neural network or a recurrent neural network requiring extremely long-term memories.
引用
收藏
页码:1454 / 1466
页数:13
相关论文
共 50 条
  • [1] Stochastic Gradient Descent with Preconditioned Polyak Step-Size
    Abdukhakimov, F.
    Xiang, C.
    Kamzolov, D.
    Takac, M.
    [J]. COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2024, 64 (04) : 621 - 634
  • [2] Preconditioned Stochastic Gradient Descent Optimisation for Monomodal Image Registration
    Klein, Stefan
    Staring, Marius
    Andersson, Patrik
    Pluim, Josien P. W.
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION (MICCAI 2011), PT II, 2011, 6892 : 549 - +
  • [3] Efficient preconditioned stochastic gradient descent for estimation in latent variable models
    Baey, Charlotte
    Delattre, Maud
    Kuhn, Estelle
    Leger, Jean-Benoist
    Lemler, Sarah
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [4] Uncertainty Sampling is Preconditioned Stochastic Gradient Descent on Zero-One Loss
    Mussmann, Stephen
    Liang, Percy
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [5] Stochastic Gradient Methods with Preconditioned Updates
    Abdurakhmon Sadiev
    Aleksandr Beznosikov
    Abdulla Jasem Almansoori
    Dmitry Kamzolov
    Rachael Tappenden
    Martin Takáč
    [J]. Journal of Optimization Theory and Applications, 2024, 201 : 471 - 489
  • [6] Stochastic Gradient Methods with Preconditioned Updates
    Sadiev, Abdurakhmon
    Beznosikov, Aleksandr
    Almansoori, Abdulla Jasem
    Kamzolov, Dmitry
    Tappenden, Rachael
    Takac, Martin
    [J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2024, 201 (02) : 471 - 489
  • [7] Constrained and Preconditioned Stochastic Gradient Method
    Jiang, Hong
    Huang, Gang
    Wilford, Paul A.
    Yu, Liangkai
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2015, 63 (10) : 2678 - 2691
  • [8] Unforgeability in Stochastic Gradient Descent
    Baluta, Teodora
    Nikolic, Ivica
    Jain, Racchit
    Aggarwal, Divesh
    Saxena, Prateek
    [J]. PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1138 - 1152
  • [9] Stochastic gradient descent tricks
    Bottou, Léon
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, 7700 LECTURE NO : 421 - 436
  • [10] Stochastic Reweighted Gradient Descent
    El Hanchi, Ayoub
    Stephens, David A.
    Maddison, Chris J.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,