The Implicit Bias of Gradient Descent on Separable Data

被引：0

作者：

Soudry, Daniel ^{[1
]}

Hoffer, Elad ^{[1
]}

Nacson, Mor Shpigel ^{[1
]}

Gunasekar, Suriya ^{[2
]}

Srebro, Nathan ^{[2
]}

机构：

[1] Technion, Dept Elect Engn, IL-320003 Haifa, Israel

[2] Toyota Technol Inst, Chicago, IL 60637 USA

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2018年 / 19卷

基金：

美国国家科学基金会; 以色列科学基金会;

关键词：

gradient descent; implicit regularization; generalization; margin; logistic regression;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization in more complex models and with other optimization methods.

引用

页数：57

共 50 条

[1] The implicit bias of gradient descent on separable data
Soudry, Daniel
Hoffer, Elad
Nacson, Mor Shpigel
Gunasekar, Suriya
Srebro, Nathan
Journal of Machine Learning Research, 2018, 19
[2] The implicit bias of gradient descent on nonseparable data
Ji, Ziwei
Telgarsky, Matus
CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
[3] Convergence of Gradient Descent on Separable Data
Nacson, Mor Shpigel
Lee, Jason D.
Gunasekar, Suriya
Savarese, Pedro H. P.
Srebro, Nathan
Soudry, Daniel
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[4] On the Implicit Bias of Gradient Descent for Temporal Extrapolation
Cohen-Karlik, Edo
Ben David, Avichai
Cohen, Nadav
Globerson, Amir
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[5] The Implicit Bias of AdaGrad on Separable Data
Qian, Qian
Qian, Xiaoyuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[6] Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Li, Zhiyuan
Wang, Tianhao
Lee, Jason D.
Arora, Sanjeev
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[7] Implicit Bias of Gradient Descent on Linear Convolutional Networks
Gunasekar, Suriya
Lee, Jason D.
Soudry, Daniel
Srebro, Nathan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[8] Tight Risk Bounds for Gradient Descent on Separable Data
Schliserman, Matan
Koren, Tomer
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[9] Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
Wu, Jingfeng
Braverman, Vladimir
Lee, Jason D.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[10] Gradient Descent Converges Linearly for Logistic Regression on Separable Data
Axiotis, Kyriakos
Sviridenko, Maxim
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202

← 1 2 3 4 5 →