Stochastic quasi-gradient methods: variance reduction via Jacobian sketching

被引:20
|
作者
Gower, Robert M. [1 ]
Richtarik, Peter [2 ,3 ,4 ]
Bach, Francis [5 ]
机构
[1] Inst Polytech Paris, Telecom Paris, LTCI, Palaiseau, France
[2] King Abdullah Univ Sci & Technol KAUST, Thuwal, Saudi Arabia
[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[4] Moscow Inst Phys & Technol MIPT, Dolgoprudnyi, Russia
[5] PSL Res Univ, INRIA, ENS, Paris, France
基金
欧洲研究理事会;
关键词
Stochastic gradient descent; Sketching; Variance reduction; Covariates; JOHNSON-LINDENSTRAUSS; OPTIMIZATION;
D O I
10.1007/s10107-020-01506-0
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We develop a new family of variance reduced stochastic gradient descent methods for minimizing the average of a very large number of smooth functions. Our method-JacSketch-is motivated by novel developments in randomized numerical linear algebra, and operates by maintaining a stochastic estimate of a Jacobian matrix composed of the gradients of individual functions. In each iteration, JacSketch efficiently updates the Jacobian matrix by first obtaining a random linear measurement of the true Jacobian through (cheap) sketching, and then projecting the previous estimate onto the solution space of a linear matrix equation whose solutions are consistent with the measurement. The Jacobian estimate is then used to compute a variance-reduced unbiased estimator of the gradient. Our strategy is analogous to the way quasi-Newton methods maintain an estimate of the Hessian, and hence our method can be seen as a stochastic quasi-gradient method. Our method can also be seen as stochastic gradient descent applied to a controlled stochastic optimization reformulation of the original problem, where the control comes from the Jacobian estimates. We prove that for smooth and strongly convex functions, JacSketch converges linearly with a meaningful rate dictated by a single convergence theorem which applies to general sketches. We also provide a refined convergence theorem which applies to a smaller class of sketches, featuring a novel proof technique based on a stochastic Lyapunov function. This enables us to obtain sharper complexity results for variants of JacSketch with importance sampling. By specializing our general approach to specific sketching strategies, JacSketch reduces to the celebrated stochastic average gradient (SAGA) method, and its several existing and many new minibatch, reduced memory, and importance sampling variants. Our rate for SAGA with importance sampling is the current best-known rate for this method, resolving a conjecture by Schmidt et al. (Proceedings of the eighteenth international conference on artificial intelligence and statistics, AISTATS 2015, San Diego, California, 2015). The rates we obtain for minibatch SAGA are also superior to existing rates and are sufficiently tight as to show a decrease in total complexity as the minibatch size increases. Moreover, we obtain the first minibatch SAGA method with importance sampling.
引用
收藏
页码:135 / 192
页数:58
相关论文
共 50 条
  • [21] PROXIMAL STOCHASTIC GRADIENT METHOD WITH PROGRESSIVE VARIANCE REDUCTION
    Xiao, Lin
    Zhang, Tong
    SIAM JOURNAL ON OPTIMIZATION, 2014, 24 (04) : 2057 - 2075
  • [22] STOCHASTIC QUASI-GRADIENT ALGORITHM OF THE GLOBAL OPTIMIZATION FOR SOLVING PROBLEMS OF THE INHOMOGENEOUS CHEMICOTECHNOLOGICAL SYSTEM SYNTHESIS
    KAFAROV, VV
    MESHALKIN, VP
    SIVAEV, SB
    PENTSIAK, I
    DOKLADY AKADEMII NAUK SSSR, 1984, 275 (03): : 670 - 674
  • [23] Accelerating variance-reduced stochastic gradient methods
    Driggs, Derek
    Ehrhardt, Matthias J.
    Schonlieb, Carola-Bibiane
    MATHEMATICAL PROGRAMMING, 2022, 191 (02) : 671 - 715
  • [24] Accelerating variance-reduced stochastic gradient methods
    Derek Driggs
    Matthias J. Ehrhardt
    Carola-Bibiane Schönlieb
    Mathematical Programming, 2022, 191 : 671 - 715
  • [25] Stochastic Variance Reduction for Variational Inequality Methods
    Alacaoglu, Ahmet
    Malitsky, Yura
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178 : 778 - 816
  • [26] Stochastic Variance Reduction Methods for Policy Evaluation
    Du, Simon S.
    Chen, Jianshu
    Li, Lihong
    Xiao, Lin
    Zhou, Dengyong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [27] MIXED STOCHASTIC QUASI-GRADIENT AND PENALTY-FUNCTION METHOD FOR SOLVING MINIMAX PROBLEMS WITH COUPLED VARIABLES
    ZAVRIEV, SK
    PEREVOZCHIKOV, AG
    CYBERNETICS AND SYSTEMS ANALYSIS, 1991, 27 (06) : 883 - 888
  • [28] On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants
    Reddi, Sashank J.
    Hefny, Ahmed
    Sra, Suvrit
    Poczos, Barnabas
    Smola, Alex
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [29] Nonconvex optimization with inertial proximal stochastic variance reduction gradient
    He, Lulu
    Ye, Jimin
    Jianwei, E.
    INFORMATION SCIENCES, 2023, 648
  • [30] Some variance reduction methods for numerical stochastic homogenization
    Blanc, X.
    Le Bris, C.
    Legoll, F.
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2016, 374 (2066):