Stochastic Gradient Descent with Preconditioned Polyak Step-Size

被引:0
|
作者
Abdukhakimov, F. [1 ]
Xiang, C. [1 ]
Kamzolov, D. [1 ]
Takac, M. [1 ]
机构
[1] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
关键词
machine learning; optimization; adaptive step-size; Polyak step-size; preconditioning; APPROXIMATION; ESTIMATOR;
D O I
10.1134/S0965542524700052
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Stochastic Gradient Descent (SGD) is one of the many iterative optimization methods that are widely used in solving machine learning problems. These methods display valuable properties and attract researchers and industrial machine learning engineers with their simplicity. However, one of the weaknesses of this type of methods is the necessity to tune learning rate (step-size) for every loss function and dataset combination to solve an optimization problem and get an efficient performance in a given time budget. Stochastic Gradient Descent with Polyak Step-size (SPS) is a method that offers an update rule that alleviates the need of fine-tuning the learning rate of an optimizer. In this paper, we propose an extension of SPS that employs preconditioning techniques, such as Hutchinson's method, Adam, and AdaGrad, to improve its performance on badly scaled and/or ill-conditioned datasets.
引用
下载
收藏
页码:621 / 634
页数:14
相关论文
共 50 条
  • [1] Stochastic IHT With Stochastic Polyak Step-Size for Sparse Signal Recovery
    Li, Changhao
    Ma, Zhixin
    Sun, Dazhi
    Zhang, Guoming
    Wen, Jinming
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2035 - 2039
  • [2] Preconditioned Stochastic Gradient Descent
    Li, Xi-Lin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (05) : 1454 - 1466
  • [3] Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent
    Ren, Tongzheng
    Cui, Fuheng
    Atsidakou, Alexia
    Sanghavi, Sujay
    Ho, Nhat
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [4] Towards statistical and computational complexities of polyak step size gradient descent
    Department of Computer Science, University of Texas at Austin
    不详
    不详
    arXiv, 1600,
  • [5] Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent
    The University of Texas, Austin, United States
    Proc. Mach. Learn. Res., (3930-3961): : 3930 - 3961
  • [6] A STOCHASTIC GRADIENT ADAPTIVE FILTER WITH GRADIENT ADAPTIVE STEP-SIZE
    MATHEWS, VJ
    XIE, ZH
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1993, 41 (06) : 2075 - 2087
  • [7] Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence
    Loizou, Nicolas
    Vaswani, Sharan
    Laradji, Issam
    Lacoste-Julien, Simon
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [8] Gradient-Descent Adaptive Filtering Using Gradient Adaptive Step-Size
    Talebi, Sayed Pouria
    Darvishi, Hossein
    Werner, Stefan
    Rossi, Pierluigi Salvo
    2022 IEEE 12TH SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP (SAM), 2022, : 321 - 325
  • [10] Stochastic Gradient Descent with Polyak's Learning Rate
    Prazeres, Mariana
    Oberman, Adam M.
    JOURNAL OF SCIENTIFIC COMPUTING, 2021, 89 (01)