Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

被引:0
|
作者
Wu, Jingfeng [1 ]
Braverman, Vladimir [2 ]
Lee, Jason D. [3 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Rice Univ, Houston, TX 77005 USA
[3] Princeton Univ, Princeton, NJ 08544 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent research has observed that in machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS) [Cohen et al., 2021], where the stepsizes are set to be large, resulting in non-monotonic losses induced by the GD iterates. This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime. Despite the presence of local oscillations, we prove that the logistic loss can be minimized by GD with any constant stepsize over a long time scale. Furthermore, we prove that with any constant stepsize, the GD iterates tend to infinity when projected to a max-margin direction (the hard-margin SVM direction) and converge to a fixed vector that minimizes a strongly convex potential when projected to the orthogonal complement of the max-margin direction. In contrast, we also show that in the EoS regime, GD iterates may diverge catastrophically under the exponential loss, highlighting the superiority of the logistic loss. These theoretical findings are in line with numerical simulations and complement existing theories on the convergence and implicit bias of GD for logistic regression, which are only applicable when the stepsizes are sufficiently small.
引用
收藏
页数:28
相关论文
共 50 条
  • [31] On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent
    Azulay, Shahar
    Moroshko, Edward
    Nacson, Mor Shpigel
    Woodworth, Blake
    Srebro, Nathan
    Globerson, Amir
    Soudry, Daniel
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [32] A MODEL OF DOUBLE DESCENT FOR HIGH-DIMENSIONAL LOGISTIC REGRESSION
    Deng, Zeyu
    Kammoun, Abla
    Thrampoulidis, Christos
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4267 - 4271
  • [33] Learning a Single Neuron with Bias Using Gradient Descent
    Vardi, Gal
    Yehudai, Gilad
    Shamir, Ohad
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [34] Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks on Nearly-orthogonal Data
    Kou, Yiwen
    Chen, Zixiang
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [35] Scalable statistical inference for averaged implicit stochastic gradient descent
    Fang, Yixin
    SCANDINAVIAN JOURNAL OF STATISTICS, 2019, 46 (04) : 987 - 1002
  • [36] Automatic Detection of Concrete Spalling Using Piecewise Linear Stochastic Gradient Descent Logistic Regression and Image Texture Analysis
    Nhat-Duc Hoang
    Quoc-Lam Nguyen
    Xuan-Linh Tran
    COMPLEXITY, 2019, 2019
  • [37] STOCHASTIC GRADIENT DESCENT FOR SPECTRAL EMBEDDING WITH IMPLICIT ORTHOGONALITY CONSTRAINT
    El Gheche, Mireille
    Chierchia, Giovanni
    Frossard, Pascal
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3567 - 3571
  • [38] Gradient descent algorithms for quantile regression with smooth approximation
    Zheng, Songfeng
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2011, 2 (03) : 191 - 207
  • [39] Functional gradient descent for n-tuple regression
    Katopodis, Rafael F.
    Lima, Priscila M. V.
    Franca, Felipe M. G.
    NEUROCOMPUTING, 2022, 500 : 1016 - 1028
  • [40] On the convergence of gradient descent for robust functional linear regression
    Wang, Cheng
    Fan, Jun
    JOURNAL OF COMPLEXITY, 2024, 84