Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization

被引：0

作者：

Amir, Idan ^{[1
]}

Livni, Roi ^{[1
]}

Srebro, Nathan ^{[2
]}

机构：

[1] Tel Aviv Univ, Tel Aviv, Israel

[2] Toyota Technol Inst, Chicago, IL USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider linear prediction with a convex Lipschitz loss, or more generally, stochastic convex optimization problems of generalized linear form, i.e. where each instantaneous loss is a scalar convex function of a linear function. We show that in this setting, early stopped Gradient Descent (GD), without any explicit regularization or projection, ensures excess error at most epsilon (compared to the best possible with unit Euclidean norm) with an optimal, up to logarithmic factors, sample complexity of (O) over tilde (1/epsilon(2)) and only (O) over tilde (1/epsilon(2)) iterations. This contrasts with general stochastic convex optimization, where (O) over tilde (1/epsilon(4)) iterations are needed Amir et al. [2]. The lower iteration complexity is ensured by leveraging uniform convergence rather than stability. But instead of uniform convergence in a norm ball, which we show can guarantee suboptimal learning using Theta(1/epsilon(4)) samples, we rely on uniform convergence in a distribution-dependent ball.

引用

页数：12

共 50 条

[21] Generalized stochastic Frank-Wolfe algorithm with stochastic "substitute" gradient for structured convex optimization
Lu, Haihao
Freund, Robert M.
MATHEMATICAL PROGRAMMING, 2021, 187 (1-2) : 317 - 349
[22] Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization
Haghifam, Mahdi
Rodriguez-Galvez, Borja
Thobaben, Ragnar
Skoglund, Mikael
Roy, Daniel M.
Dziugaite, Gintare Karolina
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 663 - 706
[23] Stochastic gradient descent for optimization for nuclear systems
Austin Williams
Noah Walton
Austin Maryanski
Sandra Bogetic
Wes Hines
Vladimir Sobes
Scientific Reports, 13
[24] Ant colony optimization and stochastic gradient descent
Meuleau, N
Dorigo, M
ARTIFICIAL LIFE, 2002, 8 (02) : 103 - 121
[25] Stochastic gradient descent for wind farm optimization
Quick, Julian
Rethore, Pierre-Elouan
Pedersen, Mads Molgaard
Rodrigues, Rafael Valotta
Friis-Moller, Mikkel
WIND ENERGY SCIENCE, 2023, 8 (08) : 1235 - 1250
[26] Stochastic Chebyshev Gradient Descent for Spectral Optimization
Han, Insu
Avron, Haim
Shin, Jinwoo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[27] Stochastic gradient descent for optimization for nuclear systems
Williams, Austin
Walton, Noah
Maryanski, Austin
Bogetic, Sandra
Hines, Wes
Sobes, Vladimir
SCIENTIFIC REPORTS, 2023, 13 (01)
[28] Efficient displacement convex optimization with particle gradient descent
Daneshmand, Hadi
Lee, Jason D.
Jin, Chi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[29] Evolutionary Gradient Descent for Non-convex Optimization
Xue, Ke
Qian, Chao
Xu, Ling
Fei, Xudong
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3221 - 3227
[30] Online convex optimization in the bandit setting: gradient descent without a gradient
Flaxman, Abraham D.
Kalai, Adam Tauman
McMahan, H. Brendan
PROCEEDINGS OF THE SIXTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2005, : 385 - 394

← 1 2 3 4 5 →