Train simultaneously, generalize better: Stability of gradient-based minimax learners

被引:0
|
作者
Farnia, Farzan [1 ]
Ozdaglar, Asuman [1 ]
机构
[1] MIT, Lab Informat & Decis Syst, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The success of minimax learning problems of generative adversarial networks (GANs) has been observed to depend on the minimax optimization algorithm used for their training. This dependence is commonly attributed to the convergence speed and robustness properties of the underlying optimization algorithm. In this paper, we show that the optimization algorithm also plays a key role in the generalization performance of the trained minimax model. To this end, we analyze the generalization properties of standard gradient descent ascent (GDA) and proximal point method (PPM) algorithms through the lens of algorithmic stability as defined by Bousquet & Elisseeff, 2002 under both convex concave and non-convex nonconcave minimax settings. While the GDA algorithm is not guaranteed to have a vanishing excess risk in convex concave problems, we show the PPM algorithm enjoys a bounded excess risk in the same setup. For non-convex non-concave problems, we compare the generalization performance of stochastic GDA and GDmax algorithms where the latter fully solves the maximization subproblem at every iteration. Our generalization analysis suggests the superiority of GDA provided that the minimization and maximization subproblems are solved simultaneously with similar learning rates. We discuss several numerical results indicating the role of optimization algorithms in the generalization of learned minimax models.
引用
收藏
页数:12
相关论文
共 24 条
  • [1] Train faster, generalize better: Stability of stochastic gradient descent
    Hardt, Moritz
    Recht, Benjamin
    Singer, Yoram
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [2] Gradient-based minimax optimization of planar microstrip structures with the use of electromagnetic simulations
    Ureel, J
    DeZutter, DL
    [J]. INTERNATIONAL JOURNAL OF MICROWAVE AND MILLIMETER-WAVE COMPUTER-AIDED ENGINEERING, 1997, 7 (01): : 29 - 36
  • [3] Gradient-based optimization for regression in the functional tensor-train format
    Gorodetsky, Alex A.
    Jakeman, John D.
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2018, 374 : 1219 - 1238
  • [4] Bio-inspired and gradient-based algorithms to train MLPs: The influence of diversity
    Pasti, Rodrigo
    de Castro, Leandro Nunes
    [J]. INFORMATION SCIENCES, 2009, 179 (10) : 1441 - 1453
  • [5] Stability Analysis of Gradient-Based Neural Networks for Optimization Problems
    Qiaoming Han
    Li-Zhi Liao
    Houduo Qi
    Liqun Qi
    [J]. Journal of Global Optimization, 2001, 19 : 363 - 381
  • [6] Stability analysis of gradient-based neural networks for optimization problems
    Han, QM
    Liao, LZ
    Qi, HD
    Qi, LQ
    [J]. JOURNAL OF GLOBAL OPTIMIZATION, 2001, 19 (04) : 363 - 381
  • [7] A Novel Control-Variates Approach for Performative Gradient-Based Learners with Missing Data
    Han, Xing
    Hu, Jing
    Ghosh, Joydeep
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [8] An immune and a gradient-based method to train multi-layer perceptron neural networks
    Pasti, Rodrigo
    de Castro, Leandro Nunes
    [J]. 2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 2075 - +
  • [9] Chaotic Global Optimization by Direct Stability Control of Gradient-Based Systems
    Masuda, Kazuaki
    Kurihara, Kenzo
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 4690 - 4697
  • [10] A dialog-oriented and gradient-based stability margin in uncertain systems
    Weinmann, A
    [J]. CYBERNETICS AND SYSTEMS, 2005, 36 (07) : 641 - 666