Dynamics of learning in multilayer perceptrons near singularities

被引:42
|
作者
Cousseau, Florent [1 ,2 ]
Ozeki, Tomoko [1 ,3 ]
Amari, Shun-ichi [1 ]
机构
[1] RIKEN, Brain Sci Inst, Amari Unit Math Neurosci, Wako, Saitama 3510198, Japan
[2] Univ Tokyo, Grad Sci Frontier Sci, Dept Complex Sci & Engn, Chiba 2778561, Japan
[3] Tokai Univ, Dept Human & Informat Sci, Kanagawa 2591292, Japan
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2008年 / 19卷 / 08期
关键词
dynamics of learning; multilayer perceptrons; natural gradient (NGD) learning; singularity; standard gradient (SGD) learning;
D O I
10.1109/TNN.2008.2000391
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The dynamical behavior of learning is known to be very slow for the multilayer perceptron, being often trapped in the "plateau." It has been recently understood that this is due to the singularity in the parameter space of perceptrons, in which trajectories of learning are drawn. The space is Riemannian from the point of view of information geometry and contains singular regions where the Riemannian metric or the Fisher information matrix degenerates. This paper analyzes the dynamics of learning in a neighborhood of the singular regions when the true teacher machine lies at the singularity. We give explicit asymptotic analytical solutions (trajectories) both for the standard gradient (SGD) and natural gradient (NGD) methods. It is clearly shown, in the case of the SGD method, that the plateau phenomenon appears in a neighborhood of the critical regions, where the dynamical behavior is extremely slow. The analysis of the NGD method is much more difficult, because the inverse of the Fisher information matrix diverges. We conquer the difficulty by introducing the "blow-down" technique used in algebraic geometry. The NGD method works efficiently, and the state converges directly to the true parameters very quickly while it staggers in the case of the SGD method. The analytical results are compared with computer simulations, showing good agreement. The effects of singularities on learning are thus qualitatively clarified for both standard and NGD methods.
引用
收藏
页码:1313 / 1328
页数:16
相关论文
共 50 条
  • [1] Application of the Error Function in Analyzing the Learning Dynamics Near Singularities of the Multilayer Perceptrons
    Guo Weili
    Wei Haikun
    Zhao Junsheng
    Li Weiling
    Zhang Kanjian
    PROCEEDINGS OF THE 31ST CHINESE CONTROL CONFERENCE, 2012, : 3240 - 3243
  • [2] Theoretical and numerical analysis of learning dynamics near singularity in multilayer perceptrons
    Guo, Weili
    Wei, Haikun
    Zhao, Junsheng
    Zhang, Kanjian
    NEUROCOMPUTING, 2015, 151 : 390 - 400
  • [3] Geometrical singularities in the neuromanifold of multilayer Perceptrons
    Amari, S
    Park, H
    Ozeki, T
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 343 - 350
  • [4] Online learning dynamics of multilayer perceptrons with unidentifiable parameters
    Park, H
    Inoue, M
    Okada, M
    JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 2003, 36 (47): : 11753 - 11764
  • [5] Efficiently learning multilayer perceptrons
    Bunzmann, C
    Biehl, M
    Urbanczik, R
    PHYSICAL REVIEW LETTERS, 2001, 86 (10) : 2166 - 2169
  • [6] Geometry of learning in multilayer perceptrons
    Amari, S
    Park, H
    Ozeki, T
    COMPSTAT 2004: PROCEEDINGS IN COMPUTATIONAL STATISTICS, 2004, : 49 - 60
  • [7] Active learning in multilayer perceptrons
    Fukumizu, K
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 295 - 301
  • [8] NONLINEAR DYNAMICS OF FEEDBACK MULTILAYER PERCEPTRONS
    BAUER, HU
    GEISEL, T
    PHYSICAL REVIEW A, 1990, 42 (04): : 2401 - 2409
  • [9] Statistical active learning in multilayer perceptrons
    Fukumizu, K
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2000, 11 (01): : 17 - 26
  • [10] A Hybrid Learning Method for Multilayer Perceptrons
    Zhon Meide Huang Wenhu Hong Jiarong (School of Astronautics)
    哈尔滨工业大学学报, 1990, (03) : 52 - 61