Generalization Error Bounds for Optimization Algorithms via Stability

被引:0
|
作者
Meng, Qi [1 ]
Wang, Yue [2 ]
Chen, Wei [3 ]
Wang, Taifeng [3 ]
Ma, Zhi-Ming [4 ]
Liu, Tie-Yan [3 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Beijing Jiaotong Univ, Beijing, Peoples R China
[3] Microsoft Res, Redmond, WA USA
[4] Chinese Acad Math & Syst Sci, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG). Conventional analysis on these optimization algorithms focuses on their convergence rates during the training process, however, people in the machine learning community may care more about the generalization performance of the learned model on unseen test data. In this paper, we investigate on this issue, by using stability as a tool. In particular, we decompose the generalization error for R-ERM, and derive its upper bound for both convex and nonconvex cases. In convex cases, we prove that the generalization error can be bounded by the convergence rate of the optimization algorithm and the stability of the R-ERM process, both in expectation (in the order of O(1/n) +E rho(T)), where rho(T) is the convergence error and T is the number of iterations) and in high probability (in the order of O(log 1/delta/root n + rho(T)) with probability 1 - delta). For nonconvex cases, we can also obtain a similar expected generalization error bound. Our theorems indicate that 1) along with the training process, the generalization error will decrease for all the optimization algorithms under our investigation; 2) Comparatively speaking, SVRG has better generalization ability than GD and SGD. We have conducted experiments on both convex and nonconvex problems, and the experimental results verify our theoretical findings.
引用
收藏
页码:2336 / 2342
页数:7
相关论文
共 50 条
  • [41] Bounds on the generalization ability of Bayesian inference and Gibbs algorithms
    Teytaud, O
    Paugam-Moisy, H
    ARTIFICIAL NEURAL NETWORKS-ICANN 2001, PROCEEDINGS, 2001, 2130 : 265 - 270
  • [42] Stability Analysis and Generalization Bounds of Adversarial Training
    Xiao, Jiancong
    Fan, Yanbo
    Sun, Ruoyu
    Wang, Jue
    Luo, Zhi-Quan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [43] L2-Uniform Stability of Randomized Learning Algorithms: Sharper Generalization Bounds and Confidence Boosting
    Yuan, Xiao-Tong
    Li, Ping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [44] Stability and generalization of bipartite ranking algorithms
    Agarwal, S
    Niyogi, P
    LEARNING THEORY, PROCEEDINGS, 2005, 3559 : 32 - 47
  • [45] Nonasymptotic bounds on the estimation error of MCMC algorithms
    Latuszynski, Krzysztof
    Miasojedow, Blazej
    Niemiro, Wojciech
    BERNOULLI, 2013, 19 (5A) : 2033 - 2066
  • [46] Robustness of OBE algorithms to underestimation of error bounds
    Sun, Xianfang
    Zhang, Zhifang
    Ning, Wenru
    Fan, Yuezu
    Zidonghua Xuebao/Acta Automatica Sinica, 1998, 24 (06): : 784 - 788
  • [47] Refined Error Bounds for Several Learning Algorithms
    Hanneke, Steve
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [48] Refined error bounds for several learning algorithms
    Hanneke, Steve
    Journal of Machine Learning Research, 2016, 17 : 1 - 55
  • [49] Oblique projections: formulas, algorithms, and error bounds
    Kayalar, Selahattin
    Weinert, Howard L.
    Mathematics of Control, Signals, and Systems, 1989, 2 (01) : 33 - 45
  • [50] Robustness Implies Generalization via Data-Dependent Generalization Bounds
    Kawaguchi, Kenji
    Deng, Zhun
    Luh, Kyle
    Huang, Jiaoyang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 10866 - 10894