Generalization Error Bounds for Optimization Algorithms via Stability

被引:0
|
作者
Meng, Qi [1 ]
Wang, Yue [2 ]
Chen, Wei [3 ]
Wang, Taifeng [3 ]
Ma, Zhi-Ming [4 ]
Liu, Tie-Yan [3 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Beijing Jiaotong Univ, Beijing, Peoples R China
[3] Microsoft Res, Redmond, WA USA
[4] Chinese Acad Math & Syst Sci, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG). Conventional analysis on these optimization algorithms focuses on their convergence rates during the training process, however, people in the machine learning community may care more about the generalization performance of the learned model on unseen test data. In this paper, we investigate on this issue, by using stability as a tool. In particular, we decompose the generalization error for R-ERM, and derive its upper bound for both convex and nonconvex cases. In convex cases, we prove that the generalization error can be bounded by the convergence rate of the optimization algorithm and the stability of the R-ERM process, both in expectation (in the order of O(1/n) +E rho(T)), where rho(T) is the convergence error and T is the number of iterations) and in high probability (in the order of O(log 1/delta/root n + rho(T)) with probability 1 - delta). For nonconvex cases, we can also obtain a similar expected generalization error bound. Our theorems indicate that 1) along with the training process, the generalization error will decrease for all the optimization algorithms under our investigation; 2) Comparatively speaking, SVRG has better generalization ability than GD and SGD. We have conducted experiments on both convex and nonconvex problems, and the experimental results verify our theoretical findings.
引用
收藏
页码:2336 / 2342
页数:7
相关论文
共 50 条
  • [1] Generalization Bounds for Ranking Algorithms via Algorithmic Stability
    Agarwal, Shivani
    Niyogi, Partha
    JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 441 - 474
  • [2] Generalization Error Bounds for Noisy, Iterative Algorithms via Maximal Leakage
    Issa, Ibrahim
    Esposito, Amedeo Roberto
    Gastpar, Michael
    THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
  • [3] Generalization Error Bounds for Noisy, Iterative Algorithms
    Pensia, Ankit
    Jog, Varun
    Loh, Po-Ling
    2018 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2018, : 546 - 550
  • [4] Generalization error bounds for Bayesian mixture algorithms
    Meir, R
    Zhang, T
    JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (05) : 839 - 860
  • [5] Upper Bounds on the Generalization Error of Private Algorithms for Discrete Data
    Rodriguez-Galvez, Borja
    Bassi, German
    Skoglund, Mikael
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (11) : 7362 - 7379
  • [6] Generalization error bounds for iterative recovery algorithms unfolded as neural networks
    Schnoor, Ekkehard
    Behboodi, Arash
    Rauhut, Holger
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2023, 12 (03)
  • [7] Information-Theoretic Bounds on the Moments of the Generalization Error of Learning Algorithms
    Aminian, Gholamali
    Toni, Laura
    Rodrigues, Miguel R. D.
    2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 682 - 687
  • [8] Learning Algorithm Generalization Error Bounds via Auxiliary Distributions
    Aminian G.
    Masiha S.
    Toni L.
    Rodrigues M.R.D.
    IEEE Journal on Selected Areas in Information Theory, 2024, 5 : 273 - 284
  • [9] Tighter Expected Generalization Error Bounds via Wasserstein Distance
    Rodriguez-Galvez, Borja
    Bassi, German
    Thobaben, Ragnar
    Skoglund, Mikael
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [10] Generalization Error Bounds via mth Central Moments of the Information Density
    Hellstrom, Fredrik
    Durisi, Giuseppe
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 2741 - 2746