Analysis of natural gradient descent for multilayer neural networks

被引：14

作者：

Rattray, M

Saad, D

机构：

[1] Univ Manchester, Dept Comp Sci, Manchester M13 9PL, Lancs, England

[2] Aston Univ, Neural Comp Res Grp, Birmingham B4 7ET, W Midlands, England

来源：

PHYSICAL REVIEW E | 1999年 / 59卷 / 04期

关键词：

D O I：

10.1103/PhysRevE.59.4523

中图分类号：

O35 [流体力学]; O53 [等离子体物理学];

学科分类号：

070204 ; 080103 ; 080704 ;

摘要：

Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods of statistical physics that accurately characterize both transient and asymptotic behavior. A solution of the learning dynamics is obtained for the case of multilayer neural network training in the limit of large input dimension. We find that natural gradient learning leads to optimal asymptotic performance and outperforms gradient descent in the transient, significantly shortening or even removing plateaus in the transient generalization performance that typically hamper gradient descent training.

引用

页码：4523 / 4532

页数：10

共 50 条

[1] ANALYSIS OF GRADIENT DESCENT LEARNING ALGORITHMS FOR MULTILAYER FEEDFORWARD NEURAL NETWORKS
GUO, H
GELFAND, SB
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1991, 38 (08): : 883 - 894
[2] Optimization of Graph Neural Networks with Natural Gradient Descent
Izadi, Mohammad Rasool
Fang, Yihao
Stevenson, Robert
Lin, Lizhen
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 171 - 179
[3] Dynamics of on-line gradient descent learning for multilayer neural networks
Saad, D
Solla, SA
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 302 - 308
[4] Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks
Zhang, Guodong
Martens, James
Grosse, Roger
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[5] A Convergence Analysis of Gradient Descent on Graph Neural Networks
Awasthi, Pranjal
Das, Abhimanyu
Gollapudi, Sreenivas
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[6] Memristor-Based Multilayer Neural Networks With Online Gradient Descent Training
Soudry, Daniel
Di Castro, Dotan
Gal, Asaf
Kolodny, Avinoam
Kvatinsky, Shahar
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (10) : 2408 - 2421
[7] INVERSION OF NEURAL NETWORKS BY GRADIENT DESCENT
KINDERMANN, J
LINDEN, A
[J]. PARALLEL COMPUTING, 1990, 14 (03) : 277 - 286
[8] Gradient Descent for Spiking Neural Networks
Huh, Dongsung
Sejnowski, Terrence J.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[9] Gradient Descent Analysis: On Visualizing the Training of Deep Neural Networks
Becker, Martin
Lippel, Jens
Zielke, Thomas
[J]. PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL 3: IVAPP, 2019, : 338 - 345
[10] Nonlinear system identification using neural networks trained with natural gradient descent
Ibnkahla, M
[J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2003, 2003 (12) : 1229 - 1237

← 1 2 3 4 5 →