A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks

被引:0
|
作者
Bu, Zhiqi [1 ]
Xu, Shiyun [1 ]
Chen, Kan [1 ]
机构
[1] Univ Penn, Dept Appl Math & Computat Sci, Philadelphia, PA 19104 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When equipped with efficient optimization algorithms, the over-parameterized neural networks have demonstrated high level of performance even though the loss function is nonconvex and non-smooth. While many works have been focusing on understanding the loss dynamics by training neural networks with the gradient descent (GD), in this work, we consider a broad class of optimization algorithms that are commonly used in practice. For example, we show from a dynamical system perspective that the Heavy Ball (HB) method can converge to global minimum on mean squared error (MSE) at a linear rate (similar to GD); however, the Nesterov accelerated gradient descent (NAG) may only converge to global minimum sublinearly. Our results rely on the connection between neural tangent kernel (NTK) and finitelywide over-parameterized neural networks with ReLU activation, which leads to analyzing the limiting ordinary differential equations (ODE) for optimization algorithms. We show that, optimizing the non-convex loss over the weights corresponds to optimizing some strongly convex loss over the prediction error. As a consequence, we can leverage the classical convex optimization theory to understand the convergence behavior of neural networks. We believe our approach can also be extended to other optimization algorithms and network architectures.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Optimization and Bayes: A Trade-off for Overparameterized Neural Networks
    Hu, Zhengmian
    Huang, Heng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] A VIEW OF NEURAL NETWORKS AS DYNAMICAL SYSTEMS
    Cessac, B.
    [J]. INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 2010, 20 (06): : 1585 - 1629
  • [3] The Role of Regularization in Overparameterized Neural Networks
    Satpathi, Siddhartha
    Gupta, Harsh
    Liang, Shiyu
    Srikant, R.
    [J]. 2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 4683 - 4688
  • [4] Global Minima of Overparameterized Neural Networks
    Cooper, Yaim
    [J]. SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2021, 3 (02): : 676 - 691
  • [5] Dynamical learning algorithms for neural networks and neural constructivism
    Blanzieri, E
    [J]. BEHAVIORAL AND BRAIN SCIENCES, 1997, 20 (04) : 559 - +
  • [6] Mathematical Models of Overparameterized Neural Networks
    Fang, Cong
    Dong, Hanze
    Zhang, Tong
    [J]. PROCEEDINGS OF THE IEEE, 2021, 109 (05) : 683 - 703
  • [7] Overparameterized neural networks implement associative memory
    Radhakrishnan, Adityanarayanan
    Belkin, Mikhail
    Uhler, Caroline
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (44) : 27162 - 27170
  • [8] Convex Formulation of Overparameterized Deep Neural Networks
    Fang, Cong
    Gu, Yihong
    Zhang, Weizhong
    Zhang, Tong
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (08) : 5340 - 5352
  • [9] Overparameterized Nonlinear Optimization with Applications to Neural Nets
    Oymak, Samet
    [J]. 2019 13TH INTERNATIONAL CONFERENCE ON SAMPLING THEORY AND APPLICATIONS (SAMPTA), 2019,
  • [10] Dynamics and Perturbations of Overparameterized Linear Neural Networks
    de Oliveira, Arthur Castello B.
    Siami, Milad
    Sontag, Eduardo D.
    [J]. 2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 7356 - 7361