Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

被引:0
|
作者
Wu, Jingfeng [1 ]
Braverman, Vladimir [2 ]
Lee, Jason D. [3 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Rice Univ, Houston, TX 77005 USA
[3] Princeton Univ, Princeton, NJ 08544 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent research has observed that in machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS) [Cohen et al., 2021], where the stepsizes are set to be large, resulting in non-monotonic losses induced by the GD iterates. This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime. Despite the presence of local oscillations, we prove that the logistic loss can be minimized by GD with any constant stepsize over a long time scale. Furthermore, we prove that with any constant stepsize, the GD iterates tend to infinity when projected to a max-margin direction (the hard-margin SVM direction) and converge to a fixed vector that minimizes a strongly convex potential when projected to the orthogonal complement of the max-margin direction. In contrast, we also show that in the EoS regime, GD iterates may diverge catastrophically under the exponential loss, highlighting the superiority of the logistic loss. These theoretical findings are in line with numerical simulations and complement existing theories on the convergence and implicit bias of GD for logistic regression, which are only applicable when the stepsizes are sufficiently small.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] The implicit bias of gradient descent on nonseparable data
    Ji, Ziwei
    Telgarsky, Matus
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [2] The Implicit Bias of Gradient Descent on Separable Data
    Soudry, Daniel
    Hoffer, Elad
    Nacson, Mor Shpigel
    Gunasekar, Suriya
    Srebro, Nathan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
  • [3] On the Implicit Bias of Gradient Descent for Temporal Extrapolation
    Cohen-Karlik, Edo
    Ben David, Avichai
    Cohen, Nadav
    Globerson, Amir
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [4] The implicit bias of gradient descent on separable data
    Soudry, Daniel
    Hoffer, Elad
    Nacson, Mor Shpigel
    Gunasekar, Suriya
    Srebro, Nathan
    Journal of Machine Learning Research, 2018, 19
  • [5] Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
    Li, Zhiyuan
    Wang, Tianhao
    Lee, Jason D.
    Arora, Sanjeev
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [6] Implicit Bias of Gradient Descent on Linear Convolutional Networks
    Gunasekar, Suriya
    Lee, Jason D.
    Soudry, Daniel
    Srebro, Nathan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [7] NONCONVEX SPARSE LOGISTIC REGRESSION VIA PROXIMAL GRADIENT DESCENT
    Shen, Xinyue
    Gu, Yuantao
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4079 - 4083
  • [8] Gradient Descent Converges Linearly for Logistic Regression on Separable Data
    Axiotis, Kyriakos
    Sviridenko, Maxim
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [9] Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks
    Jin, Hui
    Montufar, Guido
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [10] Semi-Supervised Additive Logistic Regression: A Gradient Descent Solution
    宋阳秋
    蔡渠棠
    聂飞平
    张长水
    Tsinghua Science and Technology, 2007, (06) : 638 - 646