Projective Fisher Information for Natural Gradient Descent

被引:1
|
作者
Kaul, Piyush [1 ]
Lall, Brejesh [1 ]
机构
[1] Indian Institute of Technology - Delhi, Department of Electrical Engineering, New Delhi,110016, India
来源
关键词
Complex networks - Covariance matrix - Deep neural networks - Fisher information matrix - Gradient methods - Learning algorithms;
D O I
10.1109/TAI.2022.3153593
中图分类号
学科分类号
摘要
Improvements in neural network optimization algorithms have enabled shorter training times and the ability to reach state-of-the-art performance on various machine learning tasks. Fisher information based natural gradient descent is one such second-order method that improves the convergence speed and the final performance metric achieved for many machine learning algorithms. Fisher information matrices are also helpful to analyze the properties and expected behavior of neural networks. However, natural gradient descent is a high complexity method due to the need to maintain and invert covariance matrices. This is especially the case with modern deep neural networks, which have a very high number of parameters, and for which the problem often becomes computationally unfeasible. We suggest using the Fisher information for analysis of parameter space of fully connected and convolutional neural networks without calculating the matrix itself. We also propose a lower complexity natural gradient descent algorithm based on the projection of Kronecker factors of Fisher information combined with recursive calculation of inverses, which is computationally less complex and more stable. We finally share analysis and results showing that all these optimizations do not impact the accuracy while considerably lowering the optimization process's complexity. These improvements should enable applying natural gradient descent methods for optimization to neural networks with a larger number of parameters, than possible previously. © 2020 IEEE.
引用
收藏
页码:304 / 314
相关论文
共 50 条
  • [1] Limitations of the Empirical Fisher Approximation for Natural Gradient Descent
    Kunstner, Frederik
    Balles, Lukas
    Hennig, Philipp
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [2] Understanding approximate Fisher information for fast convergence of natural gradient descent in wide neural networks*
    Karakida, Ryo
    Osawa, Kazuki
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2021, 2021 (12):
  • [3] Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
    Karakida, Ryo
    Osawa, Kazuki
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] Fisher Information and Natural Gradient Learning in Random Deep Networks
    Amari, Shun-ichi
    Karakida, Ryo
    Oizumi, Masafumi
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89 : 694 - 702
  • [5] Projective Approximation Based Gradient Descent Modification
    Senov, Alexander
    Granichin, Oleg
    [J]. IFAC PAPERSONLINE, 2017, 50 (01): : 3899 - 3904
  • [6] Energetic Natural Gradient Descent
    Thomas, Philip S.
    da Silva, Bruno Castro
    Dann, Christoph
    Brunskill, Emma
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [7] Relative Fisher Information and Natural Gradient for Learning Large Modular Models
    Sun, Ke
    Nielsen, Frank
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [8] Accelerating Gradient Descent with Projective Response Surface Methodology
    Senov, Alexander
    [J]. LEARNING AND INTELLIGENT OPTIMIZATION (LION 11 2017), 2017, 10556 : 376 - 382
  • [9] Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent
    Soto, Pedro
    Ilmer, Ilia
    Guan, Haibin
    Li, Jun
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Fisher information and model selection for projective transformations of the line
    Maybank, SJ
    [J]. PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2003, 459 (2035): : 1829 - 1849