The Implicit Bias of Gradient Descent on Separable Data

被引：0

作者：

Soudry, Daniel ^{[1
]}

Hoffer, Elad ^{[1
]}

Nacson, Mor Shpigel ^{[1
]}

Gunasekar, Suriya ^{[2
]}

Srebro, Nathan ^{[2
]}

机构：

[1] Technion, Dept Elect Engn, IL-320003 Haifa, Israel

[2] Toyota Technol Inst, Chicago, IL 60637 USA

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2018年 / 19卷

基金：

美国国家科学基金会; 以色列科学基金会;

关键词：

gradient descent; implicit regularization; generalization; margin; logistic regression;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization in more complex models and with other optimization methods.

引用

页数：57

共 50 条

[21] An implicit gradient-descent procedure for minimax problems
Montacer Essid
Esteban G. Tabak
Giulio Trigila
Mathematical Methods of Operations Research, 2023, 97 : 57 - 89
[22] Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks
Jin, Hui
Montufar, Guido
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[23] On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent
Azulay, Shahar
Moroshko, Edward
Nacson, Mor Shpigel
Woodworth, Blake
Srebro, Nathan
Globerson, Amir
Soudry, Daniel
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[24] Efficient gradient descent algorithm with anderson acceleration for separable nonlinear models
Guang-Yong Chen
Xin Lin
Peng Xue
Min Gan
Nonlinear Dynamics, 2025, 113 (10) : 11371 - 11387
[25] Learning a Single Neuron with Bias Using Gradient Descent
Vardi, Gal
Yehudai, Gilad
Shamir, Ohad
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[26] Scalable statistical inference for averaged implicit stochastic gradient descent
Fang, Yixin
SCANDINAVIAN JOURNAL OF STATISTICS, 2019, 46 (04) : 987 - 1002
[27] STOCHASTIC GRADIENT DESCENT FOR SPECTRAL EMBEDDING WITH IMPLICIT ORTHOGONALITY CONSTRAINT
El Gheche, Mireille
Chierchia, Giovanni
Frossard, Pascal
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3567 - 3571
[28] An Accelerated Coordinate Gradient Descent Algorithm for Non-separable Composite Optimization
Aviad Aberdam
Amir Beck
Journal of Optimization Theory and Applications, 2022, 193 : 219 - 246
[29] An Accelerated Coordinate Gradient Descent Algorithm for Non-separable Composite Optimization
Aberdam, Aviad
Beck, Amir
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2022, 193 (1-3) : 219 - 246
[30] Does Momentum Change the Implicit Regularization on Separable Data?
Wang, Bohan
Meng, Qi
Zhang, Huishuai
Sun, Ruoyu
Chen, Wei
Ma, Zhi-Ming
Liu, Tie-Yan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →