A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks

被引：0

作者：

Bu, Zhiqi ^{[1
]}

Xu, Shiyun ^{[1
]}

Chen, Kan ^{[1
]}

机构：

[1] Univ Penn, Dept Appl Math & Computat Sci, Philadelphia, PA 19104 USA

来源：

24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS) | 2021年 / 130卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When equipped with efficient optimization algorithms, the over-parameterized neural networks have demonstrated high level of performance even though the loss function is nonconvex and non-smooth. While many works have been focusing on understanding the loss dynamics by training neural networks with the gradient descent (GD), in this work, we consider a broad class of optimization algorithms that are commonly used in practice. For example, we show from a dynamical system perspective that the Heavy Ball (HB) method can converge to global minimum on mean squared error (MSE) at a linear rate (similar to GD); however, the Nesterov accelerated gradient descent (NAG) may only converge to global minimum sublinearly. Our results rely on the connection between neural tangent kernel (NTK) and finitelywide over-parameterized neural networks with ReLU activation, which leads to analyzing the limiting ordinary differential equations (ODE) for optimization algorithms. We show that, optimizing the non-convex loss over the weights corresponds to optimizing some strongly convex loss over the prediction error. As a consequence, we can leverage the classical convex optimization theory to understand the convergence behavior of neural networks. We believe our approach can also be extended to other optimization algorithms and network architectures.

引用

页数：12

共 50 条

[1] Optimization and Bayes: A Trade-off for Overparameterized Neural Networks
Hu, Zhengmian
Huang, Heng
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[2] A VIEW OF NEURAL NETWORKS AS DYNAMICAL SYSTEMS
Cessac, B.
[J]. INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 2010, 20 (06): : 1585 - 1629
[3] The Role of Regularization in Overparameterized Neural Networks
Satpathi, Siddhartha
Gupta, Harsh
Liang, Shiyu
Srikant, R.
[J]. 2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 4683 - 4688
[4] Global Minima of Overparameterized Neural Networks
Cooper, Yaim
[J]. SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2021, 3 (02): : 676 - 691
[5] Dynamical learning algorithms for neural networks and neural constructivism
Blanzieri, E
[J]. BEHAVIORAL AND BRAIN SCIENCES, 1997, 20 (04) : 559 - +
[6] Mathematical Models of Overparameterized Neural Networks
Fang, Cong
Dong, Hanze
Zhang, Tong
[J]. PROCEEDINGS OF THE IEEE, 2021, 109 (05) : 683 - 703
[7] Overparameterized neural networks implement associative memory
Radhakrishnan, Adityanarayanan
Belkin, Mikhail
Uhler, Caroline
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (44) : 27162 - 27170
[8] Convex Formulation of Overparameterized Deep Neural Networks
Fang, Cong
Gu, Yihong
Zhang, Weizhong
Zhang, Tong
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (08) : 5340 - 5352
[9] Overparameterized Nonlinear Optimization with Applications to Neural Nets
Oymak, Samet
[J]. 2019 13TH INTERNATIONAL CONFERENCE ON SAMPLING THEORY AND APPLICATIONS (SAMPTA), 2019,
[10] Dynamics and Perturbations of Overparameterized Linear Neural Networks
de Oliveira, Arthur Castello B.
Siami, Milad
Sontag, Eduardo D.
[J]. 2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 7356 - 7361

← 1 2 3 4 5 →