Gradient methods never overfit on separable data

被引:0
|
作者
Shamir, Ohad [1 ]
机构
[1] Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
基金
欧洲研究理事会;
关键词
Optimization - Stochastic systems - Large dataset;
D O I
暂无
中图分类号
学科分类号
摘要
A line of recent works established that when training linear predictors over separable data, using gradient methods and exponentially-tailed losses, the predictors asymptotically converge in direction to the max-margin predictor. As a consequence, the predictors asymptotically do not overfit. However, this does not address the question of whether overfitting might occur non-asymptotically, after some bounded number of iterations. In this paper, we formally show that standard gradient methods (in particular, gradient flow, gradient descent and stochastic gradient descent) never overfit on separable data: If we run these methods for T iterations on a dataset of size m, both the empirical risk and the generalization error decrease at an essentially optimal rate of Õ(1/γ2T) up till T ≈ m, at which point the generalization error remains fixed at an essentially optimal level of Õ(1/γ2m) regardless of how large T is. Along the way, we present non-asymptotic bounds on the number of margin violations over the dataset, and prove their tightness. © 2021 Ohad Shamir.
引用
收藏
相关论文
共 50 条
  • [1] Gradient Methods Never Overfit On Separable Data
    Shamir, Ohad
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [2] Convergence of Gradient Descent on Separable Data
    Nacson, Mor Shpigel
    Lee, Jason D.
    Gunasekar, Suriya
    Savarese, Pedro H. P.
    Srebro, Nathan
    Soudry, Daniel
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [3] The Implicit Bias of Gradient Descent on Separable Data
    Soudry, Daniel
    Hoffer, Elad
    Nacson, Mor Shpigel
    Gunasekar, Suriya
    Srebro, Nathan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
  • [4] The implicit bias of gradient descent on separable data
    Soudry, Daniel
    Hoffer, Elad
    Nacson, Mor Shpigel
    Gunasekar, Suriya
    Srebro, Nathan
    Journal of Machine Learning Research, 2018, 19
  • [5] Tight Risk Bounds for Gradient Descent on Separable Data
    Schliserman, Matan
    Koren, Tomer
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Gradient Descent Converges Linearly for Logistic Regression on Separable Data
    Axiotis, Kyriakos
    Sviridenko, Maxim
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [7] Separable synthesis gradient estimation methods and convergence analysis for multivariable systems
    Xu, Ling
    Ding, Feng
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2023, 427
  • [8] Commentary: To underfit and to overfit the data. This is the dilemma
    Benedetto, Umberto
    Dimagli, Arnaldo
    JOURNAL OF THORACIC AND CARDIOVASCULAR SURGERY, 2020, 160 (01): : 183 - 183
  • [9] Convergence of online gradient methods for continuous perceptrons with linearly separable training patterns
    Wu, W
    Shao, ZQ
    APPLIED MATHEMATICS LETTERS, 2003, 16 (07) : 999 - 1002
  • [10] Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
    Nacson, Mor Shpigel
    Srebro, Nathan
    Soudry, Daniel
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89