Generalization Error Bounds for Optimization Algorithms via Stability

被引:0
|
作者
Meng, Qi [1 ]
Wang, Yue [2 ]
Chen, Wei [3 ]
Wang, Taifeng [3 ]
Ma, Zhi-Ming [4 ]
Liu, Tie-Yan [3 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Beijing Jiaotong Univ, Beijing, Peoples R China
[3] Microsoft Res, Redmond, WA USA
[4] Chinese Acad Math & Syst Sci, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG). Conventional analysis on these optimization algorithms focuses on their convergence rates during the training process, however, people in the machine learning community may care more about the generalization performance of the learned model on unseen test data. In this paper, we investigate on this issue, by using stability as a tool. In particular, we decompose the generalization error for R-ERM, and derive its upper bound for both convex and nonconvex cases. In convex cases, we prove that the generalization error can be bounded by the convergence rate of the optimization algorithm and the stability of the R-ERM process, both in expectation (in the order of O(1/n) +E rho(T)), where rho(T) is the convergence error and T is the number of iterations) and in high probability (in the order of O(log 1/delta/root n + rho(T)) with probability 1 - delta). For nonconvex cases, we can also obtain a similar expected generalization error bound. Our theorems indicate that 1) along with the training process, the generalization error will decrease for all the optimization algorithms under our investigation; 2) Comparatively speaking, SVRG has better generalization ability than GD and SGD. We have conducted experiments on both convex and nonconvex problems, and the experimental results verify our theoretical findings.
引用
收藏
页码:2336 / 2342
页数:7
相关论文
共 50 条
  • [21] Renyi Divergence Based Bounds on Generalization Error
    Modak, Eeshan
    Asnani, Himanshu
    Prabhakaran, Vinod M.
    2021 IEEE INFORMATION THEORY WORKSHOP (ITW), 2021,
  • [22] Generalization Error Bounds via Renyi-, f-Divergences and Maximal Leakage
    Esposito, Amedeo Roberto
    Gastpar, Michael
    Issa, Ibrahim
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (08) : 4986 - 5004
  • [23] Generalization Bounds for Some Ordinal Regression Algorithms
    Agarwal, Shivani
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 7 - 21
  • [24] Infinite Kernel Learning: Generalization Bounds and Algorithms
    Liu, Yong
    Liao, Shizhong
    Lin, Hailun
    Yue, Yinliang
    Wang, Weiping
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2280 - 2286
  • [25] Generalization Bounds for (Wasserstein) Robust Optimization
    An, Yang
    Gao, Rui
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [26] Generalization Bounds with Minimal Dependency on Hypothesis Class via Distributionally Robust Optimization
    Zeng, Yibo
    Lam, Henry
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [27] Generalization Bounds via Convex Analysis
    Lugosi, Gabor
    Neu, Gergely
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [28] On generalization of second order optimization algorithms via direct Lyapunov method
    Bogdanov, AY
    NEURAL, PARALLEL, AND SCIENTIFIC COMPUTATIONS, VOL 2, PROCEEDINGS, 2002, : 129 - 132
  • [29] Fast Rate Generalization Error Bounds: Variations on a Theme
    Wu, Xuetong
    Manton, Jonathan H.
    Aickelin, Uwe
    Zhu, Jingge
    2022 IEEE INFORMATION THEORY WORKSHOP (ITW), 2022, : 43 - 48
  • [30] A unified generalization of some quadrature rules and error bounds
    Liu, Wenjun
    Jiang, Yong
    Tuna, Adnan
    APPLIED MATHEMATICS AND COMPUTATION, 2013, 219 (09) : 4765 - 4774