The efficiency and the robustness of natural gradient descent learning rule

被引：0

作者：

Yang, HH ^{[1
]}

Amari, S ^{[1
]}

机构：

[1] Oregon Grad Inst, Dept Comp Sci, Portland, OR 97291 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10 | 1998年 / 10卷

关键词：

D O I：

暂无

中图分类号：

B84 [心理学]; C [社会科学总论]; Q98 [人类学];

学科分类号：

03 ; 0303 ; 030303 ; 04 ; 0402 ;

摘要：

The inverse of the Fisher information matrix is used in the natural gradient descent algorithm to train single-layer and multi-layer perceptrons. We have discovered a new scheme to represent the Fisher information matrix of a stochastic multi-layer perceptron. Based on this scheme, we have designed an algorithm to compute the natural gradient. When the input dimension it is much larger than the number of hidden neurons, the complexity of this algorithm is of order O(n). It is confirmed by simulations that the natural gradient descent learning rule is not only efficient but also robust.

引用

页码：385 / 391

页数：7

共 50 条

[1] Natural gradient descent for on-line learning
Rattray, M
Saad, D
Amari, S
[J]. PHYSICAL REVIEW LETTERS, 1998, 81 (24) : 5461 - 5464
[2] Learning to learn by gradient descent by gradient descent
Andrychowicz, Marcin
Denil, Misha
Colmenarejo, Sergio Gomez
Hoffman, Matthew W.
Pfau, David
Schaul, Tom
Shillingford, Brendan
de Freitas, Nando
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[3] A New Rule-weight Learning Method based on Gradient Descent
Fakhrahmad, S. M.
Jahromi, M. Zolghadri
[J]. WORLD CONGRESS ON ENGINEERING 2009, VOLS I AND II, 2009, : 63 - +
[4] Learning to Learn without Gradient Descent by Gradient Descent
Chen, Yutian
Hoffman, Matthew W.
Colmenarejo, Sergio Gomez
Denil, Misha
Lillicrap, Timothy P.
Botvinick, Matt
de Freitas, Nando
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[5] Energetic Natural Gradient Descent
Thomas, Philip S.
da Silva, Bruno Castro
Dann, Christoph
Brunskill, Emma
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[6] On-line learning theory of soft committee machines with correlated hidden units - Steepest gradient descent and natural gradient descent
Inoue, M
Park, H
Okada, M
[J]. JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2003, 72 (04) : 805 - 810
[7] Learning Fractals by Gradient Descent
Tu, Cheng-Hao
Chen, Hong-You
Carlyn, David
Chao, Wei-Lun
[J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2456 - 2464
[8] Gradient Descent Learning With Floats
Sun, Tao
Tang, Ke
Li, Dongsheng
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (03) : 1763 - 1771
[9] LEARNING BY ONLINE GRADIENT DESCENT
BIEHL, M
SCHWARZE, H
[J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1995, 28 (03): : 643 - 656
[10] A Limitation of Gradient Descent Learning
Sum, John
Leung, Chi-Sing
Ho, Kevin
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (06) : 2227 - 2232

← 1 2 3 4 5 →