Projective Fisher Information for Natural Gradient Descent

被引:1
|
作者
Kaul, Piyush [1 ]
Lall, Brejesh [1 ]
机构
[1] Indian Institute of Technology - Delhi, Department of Electrical Engineering, New Delhi,110016, India
来源
关键词
Complex networks - Covariance matrix - Deep neural networks - Fisher information matrix - Gradient methods - Learning algorithms;
D O I
10.1109/TAI.2022.3153593
中图分类号
学科分类号
摘要
Improvements in neural network optimization algorithms have enabled shorter training times and the ability to reach state-of-the-art performance on various machine learning tasks. Fisher information based natural gradient descent is one such second-order method that improves the convergence speed and the final performance metric achieved for many machine learning algorithms. Fisher information matrices are also helpful to analyze the properties and expected behavior of neural networks. However, natural gradient descent is a high complexity method due to the need to maintain and invert covariance matrices. This is especially the case with modern deep neural networks, which have a very high number of parameters, and for which the problem often becomes computationally unfeasible. We suggest using the Fisher information for analysis of parameter space of fully connected and convolutional neural networks without calculating the matrix itself. We also propose a lower complexity natural gradient descent algorithm based on the projection of Kronecker factors of Fisher information combined with recursive calculation of inverses, which is computationally less complex and more stable. We finally share analysis and results showing that all these optimizations do not impact the accuracy while considerably lowering the optimization process's complexity. These improvements should enable applying natural gradient descent methods for optimization to neural networks with a larger number of parameters, than possible previously. © 2020 IEEE.
引用
收藏
页码:304 / 314
相关论文
共 50 条
  • [21] Kernel gradient descent algorithm for information theoretic learning
    Hu, Ting
    Wu, Qiang
    Zhou, Ding-Xuan
    [J]. JOURNAL OF APPROXIMATION THEORY, 2021, 263
  • [22] Gradient descent optimization of smoothed information retrieval metrics
    Chapelle, Olivier
    Wu, Mingrui
    [J]. INFORMATION RETRIEVAL, 2010, 13 (03): : 216 - 235
  • [23] Gradient descent optimization of smoothed information retrieval metrics
    Olivier Chapelle
    Mingrui Wu
    [J]. Information Retrieval, 2010, 13 : 216 - 235
  • [24] Fisher information regularization schemes for Wasserstein gradient flows
    Li, Wuchen
    Lu, Jianfeng
    Wang, Li
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2020, 416
  • [25] Information cut for clustering using a gradient descent approach
    Jenssen, Robert
    Erdogmus, Deniz
    Hild, Kenneth E., II
    Principe, Jose C.
    Eltoft, Torbjorn
    [J]. PATTERN RECOGNITION, 2007, 40 (03) : 796 - 806
  • [26] Learning to learn by gradient descent by gradient descent
    Andrychowicz, Marcin
    Denil, Misha
    Colmenarejo, Sergio Gomez
    Hoffman, Matthew W.
    Pfau, David
    Schaul, Tom
    Shillingford, Brendan
    de Freitas, Nando
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [27] Holonomic gradient descent for the Fisher-Bingham distribution on the d-dimensional sphere
    Koyama, Tamio
    Nakayama, Hiromasa
    Nishiyama, Kenta
    Takayama, Nobuki
    [J]. COMPUTATIONAL STATISTICS, 2014, 29 (3-4) : 661 - 683
  • [28] Coordinated gradient descent: A case study of Lagrangian dynamics with projected gradient information
    Moreau, L
    Bachmayer, R
    Leonard, NE
    [J]. LAGRANGIAN AND HAMILTONIAN METHODS IN NONLINEAR CONTROL 2003, 2003, : 57 - 62
  • [29] Learning to Learn without Gradient Descent by Gradient Descent
    Chen, Yutian
    Hoffman, Matthew W.
    Colmenarejo, Sergio Gomez
    Denil, Misha
    Lillicrap, Timothy P.
    Botvinick, Matt
    de Freitas, Nando
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [30] Phase space gradient of dissipated work and information: A role of relative Fisher information
    Yamano, Takuya
    [J]. JOURNAL OF MATHEMATICAL PHYSICS, 2013, 54 (11)