Stochastic Gradient Methods with Preconditioned Updates

被引:0
|
作者
Abdurakhmon Sadiev
Aleksandr Beznosikov
Abdulla Jasem Almansoori
Dmitry Kamzolov
Rachael Tappenden
Martin Takáč
机构
[1] Ivannikov Institute for System Programming of the Russian Academy of Sciences (ISP RAS),
[2] Moscow Institute of Physics and Technology (MIPT),undefined
[3] Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI),undefined
[4] University of Canterbury,undefined
关键词
Optimization; Non-convex optimization; Stochastic optimization; Scaled methods; Variance reduction;
D O I
暂无
中图分类号
学科分类号
摘要
This work considers the non-convex finite-sum minimization problem. There are several algorithms for such problems, but existing methods often work poorly when the problem is badly scaled and/or ill-conditioned, and a primary goal of this work is to introduce methods that alleviate this issue. Thus, here we include a preconditioner based on Hutchinson’s approach to approximating the diagonal of the Hessian and couple it with several gradient-based methods to give new ‘scaled’ algorithms: Scaled SARAH and Scaled L-SVRG. Theoretical complexity guarantees under smoothness assumptions are presented. We prove linear convergence when both smoothness and the PL-condition are assumed. Our adaptively scaled methods use approximate partial second-order curvature information and, therefore, can better mitigate the impact of badly scaled problems. This improved practical performance is demonstrated in the numerical experiments also presented in this work.
引用
收藏
页码:471 / 489
页数:18
相关论文
共 50 条
  • [41] Stochastic Gradient Geodesic MCMC Methods
    Liu, Chang
    Zhu, Jun
    Song, Yang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [42] Towards stochastic conjugate gradient methods
    Schraudolph, NN
    Graepel, T
    [J]. ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING: COMPUTATIONAL INTELLIGENCE FOR THE E-AGE, 2002, : 853 - 856
  • [43] On the Steplength Selection in Stochastic Gradient Methods
    Franchini, Giorgia
    Ruggiero, Valeria
    Zanni, Luca
    [J]. NUMERICAL COMPUTATIONS: THEORY AND ALGORITHMS, PT I, 2020, 11973 : 186 - 197
  • [44] Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates
    Balasubramanian, Krishnakumar
    Ghadimi, Saeed
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [45] TwinPCG: Dual Thread Redundancy with Forward Recovery for Preconditioned Conjugate Gradient Methods
    Dichev, Kiril
    Nikolopoulos, Dimitrios S.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 162 - 163
  • [46] PRECONDITIONED CONJUGATE-GRADIENT METHODS FOR 3-DIMENSIONAL LINEAR ELASTICITY
    DICKINSON, JK
    FORSYTH, PA
    [J]. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 1994, 37 (13) : 2211 - 2234
  • [47] Efficient solution of spacecraft thermal models using preconditioned conjugate gradient methods
    Krishnaprakas, CK
    [J]. JOURNAL OF SPACECRAFT AND ROCKETS, 1998, 35 (06) : 760 - 764
  • [48] TwinPCG: Dual Thread Redundancy with Forward Recovery for Preconditioned Conjugate Gradient Methods
    Dichev, Kiril
    Nikolopoulos, Dimitrios S.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 506 - 514
  • [49] PRECONDITIONED CONJUGATE-GRADIENT METHODS FOR THE INCOMPRESSIBLE NAVIER-STOKES EQUATIONS
    CHIN, P
    DAZEVEDO, EF
    FORSYTH, PA
    TANG, WP
    [J]. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 1992, 15 (03) : 273 - 295
  • [50] PARALLEL PRECONDITIONED CONJUGATE-GRADIENT METHODS FOR SPARSE LINEAR-SYSTEMS
    SCHAFER, M
    [J]. ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 1991, 71 (06): : T641 - T644