A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks

被引：0

作者：

Bu, Zhiqi ^{[1
]}

Xu, Shiyun ^{[1
]}

Chen, Kan ^{[1
]}

机构：

[1] Univ Penn, Dept Appl Math & Computat Sci, Philadelphia, PA 19104 USA

来源：

24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS) | 2021年 / 130卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When equipped with efficient optimization algorithms, the over-parameterized neural networks have demonstrated high level of performance even though the loss function is nonconvex and non-smooth. While many works have been focusing on understanding the loss dynamics by training neural networks with the gradient descent (GD), in this work, we consider a broad class of optimization algorithms that are commonly used in practice. For example, we show from a dynamical system perspective that the Heavy Ball (HB) method can converge to global minimum on mean squared error (MSE) at a linear rate (similar to GD); however, the Nesterov accelerated gradient descent (NAG) may only converge to global minimum sublinearly. Our results rely on the connection between neural tangent kernel (NTK) and finitelywide over-parameterized neural networks with ReLU activation, which leads to analyzing the limiting ordinary differential equations (ODE) for optimization algorithms. We show that, optimizing the non-convex loss over the weights corresponds to optimizing some strongly convex loss over the prediction error. As a consequence, we can leverage the classical convex optimization theory to understand the convergence behavior of neural networks. We believe our approach can also be extended to other optimization algorithms and network architectures.

引用

页数：12

共 50 条

[31] Global optimization of a dryer by using neural networks and genetic algorithms
Hugget, A
Sébastian, P
Nadeau, JP
[J]. AICHE JOURNAL, 1999, 45 (06) : 1227 - 1238
[32] Global optimization algorithms for training product unit neural networks
Ismail, A
Engelbrecht, AP
[J]. IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL I, 2000, : 132 - 137
[33] Optimization of evolutionary neural networks using hybrid learning algorithms
Abraham, A
[J]. PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 2797 - 2802
[34] Optimization of neural networks with the criterion of minimum risk and genetic algorithms
Yuan, Hongji
Li, Ming
Yu, Runqiao
[J]. Jisuanji Gongcheng/Computer Engineering, 2002, 28 (07):
[35] PARALLEL IN TIME ALGORITHMS FOR MULTISCALE DYNAMICAL SYSTEMS USING INTERPOLATION AND NEURAL NETWORKS
Yalla, Gopal R.
Engquist, Bjorn
[J]. HIGH PERFORMANCE COMPUTING SYMPOSIUM (HPC 2018), 2018, 50 (04):
[36] Stability and Performance Verification of Dynamical Systems Controlled by Neural Networks: Algorithms and Complexity
Korda, Milan
[J]. IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 3265 - 3270
[37] Optimization of a fermentation medium using neural networks and genetic algorithms
Nagata, Y
Chu, KH
[J]. BIOTECHNOLOGY LETTERS, 2003, 25 (21) : 1837 - 1842
[38] Optimization of composite panels using neural networks and genetic algorithms
Ruijter, W
Spallino, R
Warnet, L
de Boer, A
[J]. COMPUTATIONAL FLUID AND SOLID MECHANICS 2003, VOLS 1 AND 2, PROCEEDINGS, 2003, : 2359 - 2363
[39] Parameter optimization in melt spinning by neural networks and genetic algorithms
Huang, CC
Tang, TT
[J]. INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2006, 27 (11-12): : 1113 - 1118
[40] The Implicit Regularization for Adaptive Optimization Algorithms on Homogeneous Neural Networks
Wang, Bohan
Meng, Qi
Chen, Wei
Liu, Tie-Yan
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7861 - 7871

← 1 2 3 4 5 →