Train faster, generalize better: Stability of stochastic gradient descent

被引：0

作者：

Hardt, Moritz

Recht, Benjamin

Singer, Yoram

机构：

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48 | 2016年 / 48卷

关键词：

BOUNDS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically stable in the sense of Bousquet and Elisseeff. Our analysis only employs elementary tools from convex and continuous optimization. We derive stability bounds for both convex and non-convex optimization under standard Lipschitz and smoothness assumptions. Applying our results to the convex case, we provide new insights for why multiple epochs of stochastic gradient methods generalize well in practice. In the non-convex case, we give a new interpretation of common practices in neural networks, and formally show that popular techniques for training large deep models are indeed stability-promoting. Our findings conceptually underscore the importance of reducing training time beyond its obvious benefit.

引用

页数：10

共 50 条

[1] Train simultaneously, generalize better: Stability of gradient-based minimax learners
Farnia, Farzan
Ozdaglar, Asuman
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[2] Towards stability and optimality in stochastic gradient descent
Toulis, Panos
Tran, Dustin
Airoldi, Edoardo M.
[J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 1290 - 1298
[3] Stability and Generalization of Decentralized Stochastic Gradient Descent
Sun, Tao
Li, Dongsheng
Wang, Bao
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9756 - 9764
[4] Global Convergence and Stability of Stochastic Gradient Descent
Patel, Vivak
Zhang, Shushu
Tian, Bowen
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[5] Data-Dependent Stability of Stochastic Gradient Descent
Kuzborskij, Ilja
Lampert, Christoph H.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[6] Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
Bassily, Raef
Feldman, Vitaly
Guzman, Cristobal
Talwar, Kunal
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[7] Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm
Zhu, Miaoxi
Shen, Li
Du, Bo
Tao, Dacheng
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[8] Stability and optimization error of stochastic gradient descent for pairwise learning
Shen, Wei
Yang, Zhenhuan
Ying, Yiming
Yuan, Xiaoming
[J]. ANALYSIS AND APPLICATIONS, 2020, 18 (05) : 887 - 927
[9] Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent
Shen, Shuheng
Xu, Linli
Liu, Jingchang
Liang, Xianfeng
Cheng, Yifei
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4582 - 4589
[10] How to train a discriminative front end with stochastic gradient descent and maximum mutual information
Droppo, J
Mahajan, M
Gunawardana, A
Acero, A
[J]. 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 41 - 46

← 1 2 3 4 5 →