Analysis of natural gradient descent for multilayer neural networks

被引:14
|
作者
Rattray, M
Saad, D
机构
[1] Univ Manchester, Dept Comp Sci, Manchester M13 9PL, Lancs, England
[2] Aston Univ, Neural Comp Res Grp, Birmingham B4 7ET, W Midlands, England
来源
PHYSICAL REVIEW E | 1999年 / 59卷 / 04期
关键词
D O I
10.1103/PhysRevE.59.4523
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods of statistical physics that accurately characterize both transient and asymptotic behavior. A solution of the learning dynamics is obtained for the case of multilayer neural network training in the limit of large input dimension. We find that natural gradient learning leads to optimal asymptotic performance and outperforms gradient descent in the transient, significantly shortening or even removing plateaus in the transient generalization performance that typically hamper gradient descent training.
引用
收藏
页码:4523 / 4532
页数:10
相关论文
共 50 条
  • [1] ANALYSIS OF GRADIENT DESCENT LEARNING ALGORITHMS FOR MULTILAYER FEEDFORWARD NEURAL NETWORKS
    GUO, H
    GELFAND, SB
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1991, 38 (08): : 883 - 894
  • [2] Optimization of Graph Neural Networks with Natural Gradient Descent
    Izadi, Mohammad Rasool
    Fang, Yihao
    Stevenson, Robert
    Lin, Lizhen
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 171 - 179
  • [3] Dynamics of on-line gradient descent learning for multilayer neural networks
    Saad, D
    Solla, SA
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 302 - 308
  • [4] Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks
    Zhang, Guodong
    Martens, James
    Grosse, Roger
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] A Convergence Analysis of Gradient Descent on Graph Neural Networks
    Awasthi, Pranjal
    Das, Abhimanyu
    Gollapudi, Sreenivas
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [6] Memristor-Based Multilayer Neural Networks With Online Gradient Descent Training
    Soudry, Daniel
    Di Castro, Dotan
    Gal, Asaf
    Kolodny, Avinoam
    Kvatinsky, Shahar
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (10) : 2408 - 2421
  • [7] INVERSION OF NEURAL NETWORKS BY GRADIENT DESCENT
    KINDERMANN, J
    LINDEN, A
    [J]. PARALLEL COMPUTING, 1990, 14 (03) : 277 - 286
  • [8] Gradient Descent for Spiking Neural Networks
    Huh, Dongsung
    Sejnowski, Terrence J.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [9] Gradient Descent Analysis: On Visualizing the Training of Deep Neural Networks
    Becker, Martin
    Lippel, Jens
    Zielke, Thomas
    [J]. PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL 3: IVAPP, 2019, : 338 - 345
  • [10] Nonlinear system identification using neural networks trained with natural gradient descent
    Ibnkahla, M
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2003, 2003 (12) : 1229 - 1237