A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks

被引:0
|
作者
Bu, Zhiqi [1 ]
Xu, Shiyun [1 ]
Chen, Kan [1 ]
机构
[1] Univ Penn, Dept Appl Math & Computat Sci, Philadelphia, PA 19104 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When equipped with efficient optimization algorithms, the over-parameterized neural networks have demonstrated high level of performance even though the loss function is nonconvex and non-smooth. While many works have been focusing on understanding the loss dynamics by training neural networks with the gradient descent (GD), in this work, we consider a broad class of optimization algorithms that are commonly used in practice. For example, we show from a dynamical system perspective that the Heavy Ball (HB) method can converge to global minimum on mean squared error (MSE) at a linear rate (similar to GD); however, the Nesterov accelerated gradient descent (NAG) may only converge to global minimum sublinearly. Our results rely on the connection between neural tangent kernel (NTK) and finitelywide over-parameterized neural networks with ReLU activation, which leads to analyzing the limiting ordinary differential equations (ODE) for optimization algorithms. We show that, optimizing the non-convex loss over the weights corresponds to optimizing some strongly convex loss over the prediction error. As a consequence, we can leverage the classical convex optimization theory to understand the convergence behavior of neural networks. We believe our approach can also be extended to other optimization algorithms and network architectures.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Global optimization of a dryer by using neural networks and genetic algorithms
    Hugget, A
    Sébastian, P
    Nadeau, JP
    [J]. AICHE JOURNAL, 1999, 45 (06) : 1227 - 1238
  • [32] Global optimization algorithms for training product unit neural networks
    Ismail, A
    Engelbrecht, AP
    [J]. IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL I, 2000, : 132 - 137
  • [33] Optimization of evolutionary neural networks using hybrid learning algorithms
    Abraham, A
    [J]. PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 2797 - 2802
  • [34] Optimization of neural networks with the criterion of minimum risk and genetic algorithms
    Yuan, Hongji
    Li, Ming
    Yu, Runqiao
    [J]. Jisuanji Gongcheng/Computer Engineering, 2002, 28 (07):
  • [35] PARALLEL IN TIME ALGORITHMS FOR MULTISCALE DYNAMICAL SYSTEMS USING INTERPOLATION AND NEURAL NETWORKS
    Yalla, Gopal R.
    Engquist, Bjorn
    [J]. HIGH PERFORMANCE COMPUTING SYMPOSIUM (HPC 2018), 2018, 50 (04):
  • [36] Stability and Performance Verification of Dynamical Systems Controlled by Neural Networks: Algorithms and Complexity
    Korda, Milan
    [J]. IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 3265 - 3270
  • [37] Optimization of a fermentation medium using neural networks and genetic algorithms
    Nagata, Y
    Chu, KH
    [J]. BIOTECHNOLOGY LETTERS, 2003, 25 (21) : 1837 - 1842
  • [38] Optimization of composite panels using neural networks and genetic algorithms
    Ruijter, W
    Spallino, R
    Warnet, L
    de Boer, A
    [J]. COMPUTATIONAL FLUID AND SOLID MECHANICS 2003, VOLS 1 AND 2, PROCEEDINGS, 2003, : 2359 - 2363
  • [39] Parameter optimization in melt spinning by neural networks and genetic algorithms
    Huang, CC
    Tang, TT
    [J]. INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2006, 27 (11-12): : 1113 - 1118
  • [40] The Implicit Regularization for Adaptive Optimization Algorithms on Homogeneous Neural Networks
    Wang, Bohan
    Meng, Qi
    Chen, Wei
    Liu, Tie-Yan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7861 - 7871