SEGA: Variance Reduction via Gradient Sketching

被引:0
|
作者
Hanzely, Filip [1 ]
Mishchenko, Konstantin [1 ]
Richtarik, Peter [1 ,2 ,3 ]
机构
[1] King Abdullah Univ Sci & Technol, Thuwal, Saudi Arabia
[2] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[3] Moscow Inst Phys & Technol, Dolgoprudnyi, Russia
关键词
COORDINATE DESCENT; OPTIMIZATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a randomized first order optimization method-SEGA (SkEtched GrAdient)-which progressively throughout its iterations builds a variance-reduced estimate of the gradient from random linear measurements (sketches) of the gradient. In each iteration, SEGA updates the current estimate of the gradient through a sketch-and-project operation using the information provided by the latest sketch, and this is subsequently used to compute an unbiased estimate of the true gradient through a random relaxation procedure. This unbiased estimate is then used to perform a gradient step. Unlike standard subspace descent methods, such as coordinate descent, SEGA can be used for optimization problems with a non-separable proximal term. We provide a general convergence analysis and prove linear convergence for strongly convex objectives. In the special case of coordinate sketches, SEGA can be enhanced with various techniques such as importance sampling, minibatching and acceleration, and its rate is up to a small constant factor identical to the best-known rate of coordinate descent.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
    Gower, Robert M.
    Richtarik, Peter
    Bach, Francis
    [J]. MATHEMATICAL PROGRAMMING, 2021, 188 (01) : 135 - 192
  • [2] Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
    Robert M. Gower
    Peter Richtárik
    Francis Bach
    [J]. Mathematical Programming, 2021, 188 : 135 - 192
  • [3] Stein Variational Gradient Descent with Variance Reduction
    Nhan Dam
    Trung Le
    Viet Huynh
    Dinh Phung
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [4] Stochastic gradient descent with variance reduction technique
    Zhang, Jinjing
    Hu, Fei
    Xu, Xiaofei
    Li, Li
    [J]. WEB INTELLIGENCE, 2018, 16 (03) : 187 - 194
  • [5] Stochastic Gradient Langevin Dynamics with Variance Reduction
    Huang, Zhishen
    Becker, Stephen
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [6] Variance Reduction in Stochastic Gradient Langevin Dynamics
    Dubey, Avinava
    Reddi, Sashank J.
    Poczos, Barnabas
    Smola, Alexander J.
    Xing, Eric P.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [7] Gradient Descent for Gaussian Processes Variance Reduction
    Bottarelli, Lorenzo
    Loog, Marco
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2018, 2018, 11004 : 160 - 169
  • [8] Accelerating Randomly Projected Gradient with Variance Reduction
    Kim, SeongYoon
    Yun, SeYoung
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 531 - 534
  • [9] The Importance of Variance Reduction in Policy Gradient Method
    Lau, Tak Kit
    Liu, Yun-hui
    [J]. 2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 1376 - 1381
  • [10] Stochastic Conjugate Gradient Algorithm With Variance Reduction
    Jin, Xiao-Bo
    Zhang, Xu-Yao
    Huang, Kaizhu
    Geng, Guang-Gang
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (05) : 1360 - 1369