The Implicit Bias of Gradient Descent on Separable Data

被引:0
|
作者
Soudry, Daniel [1 ]
Hoffer, Elad [1 ]
Nacson, Mor Shpigel [1 ]
Gunasekar, Suriya [2 ]
Srebro, Nathan [2 ]
机构
[1] Technion, Dept Elect Engn, IL-320003 Haifa, Israel
[2] Toyota Technol Inst, Chicago, IL 60637 USA
基金
美国国家科学基金会; 以色列科学基金会;
关键词
gradient descent; implicit regularization; generalization; margin; logistic regression;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization in more complex models and with other optimization methods.
引用
收藏
页数:57
相关论文
共 50 条
  • [21] An implicit gradient-descent procedure for minimax problems
    Montacer Essid
    Esteban G. Tabak
    Giulio Trigila
    Mathematical Methods of Operations Research, 2023, 97 : 57 - 89
  • [22] Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks
    Jin, Hui
    Montufar, Guido
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [23] On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent
    Azulay, Shahar
    Moroshko, Edward
    Nacson, Mor Shpigel
    Woodworth, Blake
    Srebro, Nathan
    Globerson, Amir
    Soudry, Daniel
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [24] Efficient gradient descent algorithm with anderson acceleration for separable nonlinear models
    Guang-Yong Chen
    Xin Lin
    Peng Xue
    Min Gan
    Nonlinear Dynamics, 2025, 113 (10) : 11371 - 11387
  • [25] Learning a Single Neuron with Bias Using Gradient Descent
    Vardi, Gal
    Yehudai, Gilad
    Shamir, Ohad
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [26] Scalable statistical inference for averaged implicit stochastic gradient descent
    Fang, Yixin
    SCANDINAVIAN JOURNAL OF STATISTICS, 2019, 46 (04) : 987 - 1002
  • [27] STOCHASTIC GRADIENT DESCENT FOR SPECTRAL EMBEDDING WITH IMPLICIT ORTHOGONALITY CONSTRAINT
    El Gheche, Mireille
    Chierchia, Giovanni
    Frossard, Pascal
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3567 - 3571
  • [28] An Accelerated Coordinate Gradient Descent Algorithm for Non-separable Composite Optimization
    Aviad Aberdam
    Amir Beck
    Journal of Optimization Theory and Applications, 2022, 193 : 219 - 246
  • [29] An Accelerated Coordinate Gradient Descent Algorithm for Non-separable Composite Optimization
    Aberdam, Aviad
    Beck, Amir
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2022, 193 (1-3) : 219 - 246
  • [30] Does Momentum Change the Implicit Regularization on Separable Data?
    Wang, Bohan
    Meng, Qi
    Zhang, Huishuai
    Sun, Ruoyu
    Chen, Wei
    Ma, Zhi-Ming
    Liu, Tie-Yan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,