Preconditioned Stochastic Gradient Descent

被引：57

作者：

Li, Xi-Lin ^{[1
,2
,3
]}

机构：

[1] Univ Maryland Baltimore Cty, Machine Learning Signal Proc Lab, Baltimore, MD 21228 USA

[2] Fortemedia Inc, Santa Clara, CA USA

[3] Cisco Syst Inc, San Jose, CA USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2018年 / 29卷 / 05期

关键词：

Neural network; Newton method; nonconvex optimization; preconditioner; stochastic gradient descent (SGD);

D O I：

10.1109/TNNLS.2017.2672978

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Stochastic gradient descent (SGD) still is the workhorse for many practical problems. However, it converges slow, and can be difficult to tune. It is possible to precondition SGD to accelerate its convergence remarkably. But many attempts in this direction either aim at solving specialized problems, or result in significantly more complicated methods than SGD. This paper proposes a new method to adaptively estimate a preconditioner, such that the amplitudes of perturbations of preconditioned stochastic gradient match that of the perturbations of parameters to be optimized in a way comparable to Newton method for deterministic optimization. Unlike the preconditioners based on secant equation fitting as done in deterministic quasi-Newton methods, which assume positive definite Hessian and approximate its inverse, the new preconditioner works equally well for both convex and nonconvex optimizations with exact or noisy gradients. When stochastic gradient is used, it can naturally damp the gradient noise to stabilize SGD. Efficient preconditioner estimation methods are developed, and with reasonable simplifications, they are applicable to large-scale problems. Experimental results demonstrate that equipped with the new preconditioner, without any tuning effort, preconditioned SGD can efficiently solve many challenging problems like the training of a deep neural network or a recurrent neural network requiring extremely long-term memories.

引用

页码：1454 / 1466

页数：13

共 50 条

[1] Stochastic Gradient Descent with Preconditioned Polyak Step-Size
Abdukhakimov, F.
Xiang, C.
Kamzolov, D.
Takac, M.
[J]. COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2024, 64 (04) : 621 - 634
[2] Preconditioned Stochastic Gradient Descent Optimisation for Monomodal Image Registration
Klein, Stefan
Staring, Marius
Andersson, Patrik
Pluim, Josien P. W.
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION (MICCAI 2011), PT II, 2011, 6892 : 549 - +
[3] Efficient preconditioned stochastic gradient descent for estimation in latent variable models
Baey, Charlotte
Delattre, Maud
Kuhn, Estelle
Leger, Jean-Benoist
Lemler, Sarah
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[4] Uncertainty Sampling is Preconditioned Stochastic Gradient Descent on Zero-One Loss
Mussmann, Stephen
Liang, Percy
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[5] Stochastic Gradient Methods with Preconditioned Updates
Abdurakhmon Sadiev
Aleksandr Beznosikov
Abdulla Jasem Almansoori
Dmitry Kamzolov
Rachael Tappenden
Martin Takáč
[J]. Journal of Optimization Theory and Applications, 2024, 201 : 471 - 489
[6] Stochastic Gradient Methods with Preconditioned Updates
Sadiev, Abdurakhmon
Beznosikov, Aleksandr
Almansoori, Abdulla Jasem
Kamzolov, Dmitry
Tappenden, Rachael
Takac, Martin
[J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2024, 201 (02) : 471 - 489
[7] Constrained and Preconditioned Stochastic Gradient Method
Jiang, Hong
Huang, Gang
Wilford, Paul A.
Yu, Liangkai
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2015, 63 (10) : 2678 - 2691
[8] Unforgeability in Stochastic Gradient Descent
Baluta, Teodora
Nikolic, Ivica
Jain, Racchit
Aggarwal, Divesh
Saxena, Prateek
[J]. PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1138 - 1152
[9] Stochastic gradient descent tricks
Bottou, Léon
[J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, 7700 LECTURE NO : 421 - 436
[10] Stochastic Reweighted Gradient Descent
El Hanchi, Ayoub
Stephens, David A.
Maddison, Chris J.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,

← 1 2 3 4 5 →