The Implicit Bias of Gradient Descent on Separable Data

被引:0
|
作者
Soudry, Daniel [1 ]
Hoffer, Elad [1 ]
Nacson, Mor Shpigel [1 ]
Gunasekar, Suriya [2 ]
Srebro, Nathan [2 ]
机构
[1] Technion, Dept Elect Engn, IL-320003 Haifa, Israel
[2] Toyota Technol Inst, Chicago, IL 60637 USA
基金
美国国家科学基金会; 以色列科学基金会;
关键词
gradient descent; implicit regularization; generalization; margin; logistic regression;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization in more complex models and with other optimization methods.
引用
收藏
页数:57
相关论文
共 50 条
  • [31] Gradient Methods Never Overfit On Separable Data
    Shamir, Ohad
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [32] Gradient methods never overfit on separable data
    Shamir, Ohad
    Journal of Machine Learning Research, 2021, 22
  • [33] Gradient descent fusion for gravity and magnetic data
    Dubey, Chandra Prakash
    Pandey, Laxmi
    Rajalakshmi, K., V
    JOURNAL OF EARTH SYSTEM SCIENCE, 2024, 133 (03)
  • [34] Algorithm for Data Balancing Based on Gradient Descent
    Mukhin, A., V
    Kilbas, I. A.
    Paringer, R. A.
    Ilyasova, N. Yu
    Kupriyanov, A., V
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN SIGNAL PROCESSING AND ARTIFICIAL INTELLIGENCE, ASPAI' 2020, 2020, : 56 - 59
  • [35] Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization
    P. Tseng
    S. Yun
    Journal of Optimization Theory and Applications, 2009, 140
  • [36] A block coordinate gradient descent method for regularized convex separable optimization and covariance selection
    Sangwoon Yun
    Paul Tseng
    Kim-Chuan Toh
    Mathematical Programming, 2011, 129 : 331 - 355
  • [37] A block coordinate gradient descent method for regularized convex separable optimization and covariance selection
    Yun, Sangwoon
    Tseng, Paul
    Toh, Kim-Chuan
    MATHEMATICAL PROGRAMMING, 2011, 129 (02) : 331 - 355
  • [38] Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization
    Tseng, P.
    Yun, S.
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2009, 140 (03) : 513 - 535
  • [39] Learning to learn by gradient descent by gradient descent
    Andrychowicz, Marcin
    Denil, Misha
    Colmenarejo, Sergio Gomez
    Hoffman, Matthew W.
    Pfau, David
    Schaul, Tom
    Shillingford, Brendan
    de Freitas, Nando
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [40] Implicit Bias ≠ Bias on Implicit Measures
    Gawronski, Bertram
    Ledgerwood, Alison
    Eastwick, Paul W.
    PSYCHOLOGICAL INQUIRY, 2022, 33 (03) : 139 - 155