On-line learning theory of soft committee machines with correlated hidden units - Steepest gradient descent and natural gradient descent

被引:20
|
作者
Inoue, M [1 ]
Park, H
Okada, M
机构
[1] RIKEN, Brain Sci Inst, Lab Math Neurosci, Wako, Saitama 3510198, Japan
[2] Kyoto Univ, Grad Sch Med, Dept Otolaryngol Head & Neck Surg, Kyoto 6068507, Japan
[3] JST, PRESTO, Intelligent Cooperat & Control, Wako, Saitama 3510198, Japan
关键词
natural gradient descent; perceptron; soft committee machine; singularity; saddle; plateau;
D O I
10.1143/JPSJ.72.805
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The permutation symmetry of the hidden units in multilayer perceptrons causes the saddle structure and plateaus of the learning dynamics in gradient learning methods. The correlation of the weight vectors of hidden units in a teacher network is thought to affect this saddle structure, resulting in a prolonged learning time, but this mechanism is still unclear. In this paper, we discuss it with regard to soft committee machines and on-line learning using statistical mechanics. Conventional gradient descent needs more time to break the symmetry as the correlation of the teacher weight vectors rises. On the other hand, no plateaus occur with natural gradient descent regardless of the correlation for the limit of a low learning rate. Analytical results support these dynamics around the saddle point.
引用
收藏
页码:805 / 810
页数:6
相关论文
共 27 条
  • [1] Natural gradient descent for on-line learning
    Rattray, M
    Saad, D
    Amari, S
    [J]. PHYSICAL REVIEW LETTERS, 1998, 81 (24) : 5461 - 5464
  • [2] Natural gradient descent for on-line learning
    [J]. Phys Rev Lett, 24 (5461):
  • [3] Dynamics of the adaptive natural gradient descent method for soft committee machines
    Inoue, M
    Park, H
    Okada, M
    [J]. PHYSICAL REVIEW E, 2004, 69 (05): : 14
  • [4] Dynamics of on-line gradient descent learning for multilayer neural networks
    Saad, D
    Solla, SA
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 302 - 308
  • [5] Theoretical analysis of batch and on-line training for gradient descent learning in neural networks
    Nakama, Takehiko
    [J]. NEUROCOMPUTING, 2009, 73 (1-3) : 151 - 159
  • [6] The efficiency and the robustness of natural gradient descent learning rule
    Yang, HH
    Amari, S
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 385 - 391
  • [7] Gradient Descent Observer for On-Line Battery Parameter and State Coestimation
    Kruger, Eiko
    Al Shakarchi, Franck
    Quoc Tuan Tran
    [J]. 2016 IEEE/IAS 52ND INDUSTRIAL AND COMMERCIAL POWER SYSTEMS TECHNICAL CONFERENCE (I&CPS), 2016,
  • [8] Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins
    Frei, Spencer
    Cao, Yuan
    Gu, Quanquan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [9] Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy
    Koloskova, Anastasia
    McKenna, Ryan
    Charles, Zachary
    Rush, Keith
    McMahan, Brendan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Gradient Descent Using Stochastic Circuits for Efficient Training of Learning Machines
    Liu, Siting
    Jiang, Honglan
    Liu, Leibo
    Han, Jie
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (11) : 2530 - 2541