Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

被引:0
|
作者
Wu, Jingfeng [1 ]
Braverman, Vladimir [2 ]
Lee, Jason D. [3 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Rice Univ, Houston, TX 77005 USA
[3] Princeton Univ, Princeton, NJ 08544 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent research has observed that in machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS) [Cohen et al., 2021], where the stepsizes are set to be large, resulting in non-monotonic losses induced by the GD iterates. This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime. Despite the presence of local oscillations, we prove that the logistic loss can be minimized by GD with any constant stepsize over a long time scale. Furthermore, we prove that with any constant stepsize, the GD iterates tend to infinity when projected to a max-margin direction (the hard-margin SVM direction) and converge to a fixed vector that minimizes a strongly convex potential when projected to the orthogonal complement of the max-margin direction. In contrast, we also show that in the EoS regime, GD iterates may diverge catastrophically under the exponential loss, highlighting the superiority of the logistic loss. These theoretical findings are in line with numerical simulations and complement existing theories on the convergence and implicit bias of GD for logistic regression, which are only applicable when the stepsizes are sufficiently small.
引用
收藏
页数:28
相关论文
共 50 条
  • [41] LEAST SQUARE REGRESSION WITH COEFFICIENT REGULARIZATION BY GRADIENT DESCENT
    Huang, Juan
    Chen, Hong
    Li, Luoqing
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2012, 10 (01)
  • [42] Gradient descent algorithms for quantile regression with smooth approximation
    Songfeng Zheng
    International Journal of Machine Learning and Cybernetics, 2011, 2 : 191 - 207
  • [43] (S)GD over Diagonal Linear Networks: Implicit Bias, Large Stepsizes and Edge of Stability
    Even, Mathieu
    Pesme, Scott
    Gunasekar, Suriya
    Flammarion, Nicolas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [44] Gradient descent for robust kernel-based regression
    Guo, Zheng-Chu
    Hu, Ting
    Shi, Lei
    INVERSE PROBLEMS, 2018, 34 (06)
  • [45] Granular Elastic Network Regression with Stochastic Gradient Descent
    He, Linjie
    Chen, Yumin
    Zhong, Caiming
    Wu, Keshou
    MATHEMATICS, 2022, 10 (15)
  • [46] Towards stability and optimality in stochastic gradient descent
    Toulis, Panos
    Tran, Dustin
    Airoldi, Edoardo M.
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 1290 - 1298
  • [47] A comprehensive review of bias reduction methods for logistic regression
    Stolte, Marieke
    Herbrandt, Swetlana
    Ligges, Uwe
    STATISTICS SURVEYS, 2024, 18 : 139 - 162
  • [48] WEIGHTED LOGISTIC REGRESSION FOR MULTIPLE BIAS ANALYSIS.
    Johnson, C. Y.
    Howards, P. P.
    Strickland, M. J.
    Waller, D. K.
    Flanders, W. D.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2012, 175 : S46 - S46
  • [49] ADJUSTING FOR NONRESPONSE BIAS USING LOGISTIC-REGRESSION
    ALHO, JM
    BIOMETRIKA, 1990, 77 (03) : 617 - 624
  • [50] Conjugate priors and bias reduction for logistic regression models
    Rigon, Tommaso
    Aliverti, Emanuele
    STATISTICS & PROBABILITY LETTERS, 2023, 202