Stochastic Gradient Descent with Preconditioned Polyak Step-Size

被引：0

作者：

Abdukhakimov, F. ^{[1
]}

Xiang, C. ^{[1
]}

Kamzolov, D. ^{[1
]}

Takac, M. ^{[1
]}

机构：

[1] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates

来源：

COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS | 2024年 / 64卷 / 04期

关键词：

machine learning; optimization; adaptive step-size; Polyak step-size; preconditioning; APPROXIMATION; ESTIMATOR;

D O I：

10.1134/S0965542524700052

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Stochastic Gradient Descent (SGD) is one of the many iterative optimization methods that are widely used in solving machine learning problems. These methods display valuable properties and attract researchers and industrial machine learning engineers with their simplicity. However, one of the weaknesses of this type of methods is the necessity to tune learning rate (step-size) for every loss function and dataset combination to solve an optimization problem and get an efficient performance in a given time budget. Stochastic Gradient Descent with Polyak Step-size (SPS) is a method that offers an update rule that alleviates the need of fine-tuning the learning rate of an optimizer. In this paper, we propose an extension of SPS that employs preconditioning techniques, such as Hutchinson's method, Adam, and AdaGrad, to improve its performance on badly scaled and/or ill-conditioned datasets.

引用

下载

页码：621 / 634

页数：14

共 50 条

[1] Stochastic IHT With Stochastic Polyak Step-Size for Sparse Signal Recovery
Li, Changhao
Ma, Zhixin
Sun, Dazhi
Zhang, Guoming
Wen, Jinming
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2035 - 2039
[2] Preconditioned Stochastic Gradient Descent
Li, Xi-Lin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (05) : 1454 - 1466
[3] Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent
Ren, Tongzheng
Cui, Fuheng
Atsidakou, Alexia
Sanghavi, Sujay
Ho, Nhat
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[4] Towards statistical and computational complexities of polyak step size gradient descent
Department of Computer Science, University of Texas at Austin
不详
不详
arXiv, 1600,
[5] Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent
The University of Texas, Austin, United States
Proc. Mach. Learn. Res., (3930-3961): : 3930 - 3961
[6] A STOCHASTIC GRADIENT ADAPTIVE FILTER WITH GRADIENT ADAPTIVE STEP-SIZE
MATHEWS, VJ
XIE, ZH
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1993, 41 (06) : 2075 - 2087
[7] Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence
Loizou, Nicolas
Vaswani, Sharan
Laradji, Issam
Lacoste-Julien, Simon
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[8] Gradient-Descent Adaptive Filtering Using Gradient Adaptive Step-Size
Talebi, Sayed Pouria
Darvishi, Hossein
Werner, Stefan
Rossi, Pierluigi Salvo
2022 IEEE 12TH SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP (SAM), 2022, : 321 - 325
[9] STOCHASTIC GRADIENT ALGORITHMS WITH A GRADIENT-ADAPTIVE AND LIMITED STEP-SIZE
SUGIYAMA, A
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1994, E77A (03) : 534 - 538
[10] Stochastic Gradient Descent with Polyak's Learning Rate
Prazeres, Mariana
Oberman, Adam M.
JOURNAL OF SCIENTIFIC COMPUTING, 2021, 89 (01)

← 1 2 3 4 5 →