Train faster, generalize better: Stability of stochastic gradient descent

被引:0
|
作者
Hardt, Moritz
Recht, Benjamin
Singer, Yoram
机构
关键词
BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically stable in the sense of Bousquet and Elisseeff. Our analysis only employs elementary tools from convex and continuous optimization. We derive stability bounds for both convex and non-convex optimization under standard Lipschitz and smoothness assumptions. Applying our results to the convex case, we provide new insights for why multiple epochs of stochastic gradient methods generalize well in practice. In the non-convex case, we give a new interpretation of common practices in neural networks, and formally show that popular techniques for training large deep models are indeed stability-promoting. Our findings conceptually underscore the importance of reducing training time beyond its obvious benefit.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Train simultaneously, generalize better: Stability of gradient-based minimax learners
    Farnia, Farzan
    Ozdaglar, Asuman
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [2] Towards stability and optimality in stochastic gradient descent
    Toulis, Panos
    Tran, Dustin
    Airoldi, Edoardo M.
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 1290 - 1298
  • [3] Stability and Generalization of Decentralized Stochastic Gradient Descent
    Sun, Tao
    Li, Dongsheng
    Wang, Bao
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9756 - 9764
  • [4] Global Convergence and Stability of Stochastic Gradient Descent
    Patel, Vivak
    Zhang, Shushu
    Tian, Bowen
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [5] Data-Dependent Stability of Stochastic Gradient Descent
    Kuzborskij, Ilja
    Lampert, Christoph H.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [6] Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
    Bassily, Raef
    Feldman, Vitaly
    Guzman, Cristobal
    Talwar, Kunal
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [7] Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm
    Zhu, Miaoxi
    Shen, Li
    Du, Bo
    Tao, Dacheng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Stability and optimization error of stochastic gradient descent for pairwise learning
    Shen, Wei
    Yang, Zhenhuan
    Ying, Yiming
    Yuan, Xiaoming
    [J]. ANALYSIS AND APPLICATIONS, 2020, 18 (05) : 887 - 927
  • [9] Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent
    Shen, Shuheng
    Xu, Linli
    Liu, Jingchang
    Liang, Xianfeng
    Cheng, Yifei
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4582 - 4589
  • [10] How to train a discriminative front end with stochastic gradient descent and maximum mutual information
    Droppo, J
    Mahajan, M
    Gunawardana, A
    Acero, A
    [J]. 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 41 - 46