Stochastic quasi-gradient methods: variance reduction via Jacobian sketching

被引：20

作者：

Gower, Robert M. ^{[1
]}

Richtarik, Peter ^{[2
,3
,4
]}

Bach, Francis ^{[5
]}

机构：

[1] Inst Polytech Paris, Telecom Paris, LTCI, Palaiseau, France

[2] King Abdullah Univ Sci & Technol KAUST, Thuwal, Saudi Arabia

[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland

[4] Moscow Inst Phys & Technol MIPT, Dolgoprudnyi, Russia

[5] PSL Res Univ, INRIA, ENS, Paris, France

来源：

MATHEMATICAL PROGRAMMING | 2021年 / 188卷 / 01期

基金：

欧洲研究理事会;

关键词：

Stochastic gradient descent; Sketching; Variance reduction; Covariates; JOHNSON-LINDENSTRAUSS; OPTIMIZATION;

D O I：

10.1007/s10107-020-01506-0

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We develop a new family of variance reduced stochastic gradient descent methods for minimizing the average of a very large number of smooth functions. Our method-JacSketch-is motivated by novel developments in randomized numerical linear algebra, and operates by maintaining a stochastic estimate of a Jacobian matrix composed of the gradients of individual functions. In each iteration, JacSketch efficiently updates the Jacobian matrix by first obtaining a random linear measurement of the true Jacobian through (cheap) sketching, and then projecting the previous estimate onto the solution space of a linear matrix equation whose solutions are consistent with the measurement. The Jacobian estimate is then used to compute a variance-reduced unbiased estimator of the gradient. Our strategy is analogous to the way quasi-Newton methods maintain an estimate of the Hessian, and hence our method can be seen as a stochastic quasi-gradient method. Our method can also be seen as stochastic gradient descent applied to a controlled stochastic optimization reformulation of the original problem, where the control comes from the Jacobian estimates. We prove that for smooth and strongly convex functions, JacSketch converges linearly with a meaningful rate dictated by a single convergence theorem which applies to general sketches. We also provide a refined convergence theorem which applies to a smaller class of sketches, featuring a novel proof technique based on a stochastic Lyapunov function. This enables us to obtain sharper complexity results for variants of JacSketch with importance sampling. By specializing our general approach to specific sketching strategies, JacSketch reduces to the celebrated stochastic average gradient (SAGA) method, and its several existing and many new minibatch, reduced memory, and importance sampling variants. Our rate for SAGA with importance sampling is the current best-known rate for this method, resolving a conjecture by Schmidt et al. (Proceedings of the eighteenth international conference on artificial intelligence and statistics, AISTATS 2015, San Diego, California, 2015). The rates we obtain for minibatch SAGA are also superior to existing rates and are sufficiently tight as to show a decrease in total complexity as the minibatch size increases. Moreover, we obtain the first minibatch SAGA method with importance sampling.

引用

页码：135 / 192

页数：58

共 50 条

[21] PROXIMAL STOCHASTIC GRADIENT METHOD WITH PROGRESSIVE VARIANCE REDUCTION
Xiao, Lin
Zhang, Tong
SIAM JOURNAL ON OPTIMIZATION, 2014, 24 (04) : 2057 - 2075
[22] STOCHASTIC QUASI-GRADIENT ALGORITHM OF THE GLOBAL OPTIMIZATION FOR SOLVING PROBLEMS OF THE INHOMOGENEOUS CHEMICOTECHNOLOGICAL SYSTEM SYNTHESIS
KAFAROV, VV
MESHALKIN, VP
SIVAEV, SB
PENTSIAK, I
DOKLADY AKADEMII NAUK SSSR, 1984, 275 (03): : 670 - 674
[23] Accelerating variance-reduced stochastic gradient methods
Driggs, Derek
Ehrhardt, Matthias J.
Schonlieb, Carola-Bibiane
MATHEMATICAL PROGRAMMING, 2022, 191 (02) : 671 - 715
[24] Accelerating variance-reduced stochastic gradient methods
Derek Driggs
Matthias J. Ehrhardt
Carola-Bibiane Schönlieb
Mathematical Programming, 2022, 191 : 671 - 715
[25] Stochastic Variance Reduction for Variational Inequality Methods
Alacaoglu, Ahmet
Malitsky, Yura
CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178 : 778 - 816
[26] Stochastic Variance Reduction Methods for Policy Evaluation
Du, Simon S.
Chen, Jianshu
Li, Lihong
Xiao, Lin
Zhou, Dengyong
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[27] MIXED STOCHASTIC QUASI-GRADIENT AND PENALTY-FUNCTION METHOD FOR SOLVING MINIMAX PROBLEMS WITH COUPLED VARIABLES
ZAVRIEV, SK
PEREVOZCHIKOV, AG
CYBERNETICS AND SYSTEMS ANALYSIS, 1991, 27 (06) : 883 - 888
[28] On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants
Reddi, Sashank J.
Hefny, Ahmed
Sra, Suvrit
Poczos, Barnabas
Smola, Alex
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[29] Nonconvex optimization with inertial proximal stochastic variance reduction gradient
He, Lulu
Ye, Jimin
Jianwei, E.
INFORMATION SCIENCES, 2023, 648
[30] Some variance reduction methods for numerical stochastic homogenization
Blanc, X.
Le Bris, C.
Legoll, F.
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2016, 374 (2066):

← 1 2 3 4 5 →