Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares

被引:0
|
作者
Raj, Anant [1 ,2 ]
Barsbey, Melih [3 ]
Gurbuzbalaban, Mert [4 ]
Zhu, Lingjiong [5 ]
Simsekli, Umut [6 ]
机构
[1] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA
[2] PSL Res Univ, INRIA, Ecole Normale Super, Paris, France
[3] Bogazici Univ, Dept Comp Engn, Istanbul, Turkiye
[4] Rutgers State Univ, Dept Management Sci & Informat Syst, Piscataway, NJ USA
[5] Florida State Univ, Dept Math, Tallahassee, FL 32306 USA
[6] PSL Res Univ, Ecole Normale Super, CNRS, INRIA, Paris, France
基金
美国国家科学基金会; 欧洲研究理事会;
关键词
Heavy tails; SGD; algorithmic stability; SDEs; DISTRIBUTIONS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent studies have shown that heavy tails can emerge in stochastic optimization and that the heaviness of the tails have links to the generalization error. While these studies have shed light on interesting aspects of the generalization behavior in modern settings, they relied on strong topological and statistical regularity assumptions, which are hard to verify in practice. Furthermore, it has been empirically illustrated that the relation between heavy tails and generalization might not always be monotonic in practice, contrary to the conclusions of existing theory. In this study, we establish novel links between the tail behavior and generalization properties of stochastic gradient descent (SGD), through the lens of algorithmic stability. We consider a quadratic optimization problem and use a heavy-tailed stochastic differential equation (and its Euler discretization) as a proxy for modeling the heavy-tailed behavior emerging in SGD. We then prove uniform stability bounds, which reveal the following outcomes: (i) Without making any exotic assumptions, we show that SGD will not be stable if the stability is measured with the squared-loss x bar right arrow x(2), whereas it in turn becomes stable if the stability is instead measured with a surrogate loss x bar right arrow vertical bar x vertical bar(p) with some p < 2. (ii) Depending on the variance of the data, there exists a 'threshold of heavy-tailedness' such that the generalization error decreases as the tails become heavier, as long as the tails are lighter than this threshold. This suggests that the relation between heavy tails and generalization is not globally monotonic. (iii) We prove matching lower-bounds on uniform stability, implying that our bounds are tight in terms of the heaviness of the tails. We support our theory with synthetic and real neural network experiments.
引用
收藏
页码:1292 / 1342
页数:51
相关论文
共 50 条
  • [1] First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise
    Thanh Huy Nguyen
    Simsekli, Umut
    Gurbuzbalaban, Mert
    Richard, Gael
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [2] ON LEAST SQUARES ESTIMATION UNDER HETEROSCEDASTIC AND HEAVY-TAILED ERRORS
    Kuchibhotla, Arun K.
    Patra, Rohit K.
    [J]. ANNALS OF STATISTICS, 2022, 50 (01): : 277 - 302
  • [3] Chaotic Regularization and Heavy-Tailed Limits for Deterministic Gradient Descent
    Lim, Soon Hoe
    Wan, Yijun
    Simsekli, Umut
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [4] CONVERGENCE RATES OF LEAST SQUARES REGRESSION ESTIMATORS WITH HEAVY-TAILED ERRORS
    Han, Qiyang
    Wellner, Jon A.
    [J]. ANNALS OF STATISTICS, 2019, 47 (04): : 2286 - 2319
  • [5] Algorithms with Gradient Clipping for Stochastic Optimization with Heavy-Tailed Noise
    Danilova, M.
    [J]. DOKLADY MATHEMATICS, 2023, 108 (SUPPL 2) : S248 - S256
  • [6] Algorithms with Gradient Clipping for Stochastic Optimization with Heavy-Tailed Noise
    M. Danilova
    [J]. Doklady Mathematics, 2023, 108 : S248 - S256
  • [7] ON THE REGULARIZATION EFFECT OF STOCHASTIC GRADIENT DESCENT APPLIED TO LEAST-SQUARES
    Steinerberger, Stefan
    [J]. ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2021, 54 : 610 - 619
  • [8] Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping
    Gorbunov, Eduard
    Danilova, Marina
    Gasnikov, Alexander
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [9] Stochastic PDEs with heavy-tailed noise
    Chong, Carsten
    [J]. STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2017, 127 (07) : 2262 - 2280
  • [10] Stochastic Scheduling of Heavy-tailed Jobs
    Im, Sungjin
    Moseley, Benjamin
    Pruhs, Kirk
    [J]. 32ND INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2015), 2015, 30 : 474 - 486