The Implicit Bias of Gradient Descent on Separable Data

被引:0
|
作者
Soudry, Daniel [1 ]
Hoffer, Elad [1 ]
Nacson, Mor Shpigel [1 ]
Gunasekar, Suriya [2 ]
Srebro, Nathan [2 ]
机构
[1] Technion, Dept Elect Engn, IL-320003 Haifa, Israel
[2] Toyota Technol Inst, Chicago, IL 60637 USA
基金
美国国家科学基金会; 以色列科学基金会;
关键词
gradient descent; implicit regularization; generalization; margin; logistic regression;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization in more complex models and with other optimization methods.
引用
收藏
页数:57
相关论文
共 50 条
  • [1] The implicit bias of gradient descent on separable data
    Soudry, Daniel
    Hoffer, Elad
    Nacson, Mor Shpigel
    Gunasekar, Suriya
    Srebro, Nathan
    Journal of Machine Learning Research, 2018, 19
  • [2] The implicit bias of gradient descent on nonseparable data
    Ji, Ziwei
    Telgarsky, Matus
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [3] Convergence of Gradient Descent on Separable Data
    Nacson, Mor Shpigel
    Lee, Jason D.
    Gunasekar, Suriya
    Savarese, Pedro H. P.
    Srebro, Nathan
    Soudry, Daniel
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [4] On the Implicit Bias of Gradient Descent for Temporal Extrapolation
    Cohen-Karlik, Edo
    Ben David, Avichai
    Cohen, Nadav
    Globerson, Amir
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [5] The Implicit Bias of AdaGrad on Separable Data
    Qian, Qian
    Qian, Xiaoyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
    Li, Zhiyuan
    Wang, Tianhao
    Lee, Jason D.
    Arora, Sanjeev
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] Implicit Bias of Gradient Descent on Linear Convolutional Networks
    Gunasekar, Suriya
    Lee, Jason D.
    Soudry, Daniel
    Srebro, Nathan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [8] Tight Risk Bounds for Gradient Descent on Separable Data
    Schliserman, Matan
    Koren, Tomer
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
    Wu, Jingfeng
    Braverman, Vladimir
    Lee, Jason D.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Gradient Descent Converges Linearly for Logistic Regression on Separable Data
    Axiotis, Kyriakos
    Sviridenko, Maxim
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202