Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

被引:0
|
作者
Wu, Jingfeng [1 ]
Braverman, Vladimir [2 ]
Lee, Jason D. [3 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Rice Univ, Houston, TX 77005 USA
[3] Princeton Univ, Princeton, NJ 08544 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent research has observed that in machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS) [Cohen et al., 2021], where the stepsizes are set to be large, resulting in non-monotonic losses induced by the GD iterates. This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime. Despite the presence of local oscillations, we prove that the logistic loss can be minimized by GD with any constant stepsize over a long time scale. Furthermore, we prove that with any constant stepsize, the GD iterates tend to infinity when projected to a max-margin direction (the hard-margin SVM direction) and converge to a fixed vector that minimizes a strongly convex potential when projected to the orthogonal complement of the max-margin direction. In contrast, we also show that in the EoS regime, GD iterates may diverge catastrophically under the exponential loss, highlighting the superiority of the logistic loss. These theoretical findings are in line with numerical simulations and complement existing theories on the convergence and implicit bias of GD for logistic regression, which are only applicable when the stepsizes are sufficiently small.
引用
收藏
页数:28
相关论文
共 50 条
  • [21] Bias of Homotopic Gradient Descent for the Hinge Loss
    Denali Molitor
    Deanna Needell
    Rachel Ward
    Applied Mathematics & Optimization, 2021, 84 : 621 - 647
  • [22] Bias of Homotopic Gradient Descent for the Hinge Loss
    Molitor, Denali
    Needell, Deanna
    Ward, Rachel
    APPLIED MATHEMATICS AND OPTIMIZATION, 2021, 84 (01): : 621 - 647
  • [23] Stability and Change in Implicit Bias
    Vuletich, Heidi A.
    Payne, B. Keith
    PSYCHOLOGICAL SCIENCE, 2019, 30 (06) : 854 - 862
  • [24] An implicit gradient-descent procedure for minimax problems
    Essid, Montacer
    Tabak, Esteban G.
    Trigila, Giulio
    MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2023, 97 (01) : 57 - 89
  • [25] The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models
    Wang, Li
    Fu, Zhiguo
    Zhou, Yingcong
    Yan, Zili
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10149 - 10156
  • [26] An implicit gradient-descent procedure for minimax problems
    Montacer Essid
    Esteban G. Tabak
    Giulio Trigila
    Mathematical Methods of Operations Research, 2023, 97 : 57 - 89
  • [27] Stochastic Gradient Descent Meets Distribution Regression
    Muecke, Nicole
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [28] Jackknife bias reduction for polychotomous logistic regression
    Bull, SB
    Greenwood, CMT
    Hauck, WW
    STATISTICS IN MEDICINE, 1997, 16 (05) : 545 - 560
  • [29] Comment on 'Bias reduction in conditional logistic regression'
    Sun, X.
    Sinha, S.
    Wang, S.
    Maiti, T.
    STATISTICS IN MEDICINE, 2011, 30 (12) : 1466 - 1467
  • [30] Bias correction of AIC in logistic regression models
    Yanagihara, H
    Sekiguchi, R
    Fujikoshi, Y
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2003, 115 (02) : 349 - 360