基于Sherman-Morrison公式的K-FAC算法

被引:1
|
作者
刘小雷 [1 ]
高凯新 [1 ]
王勇 [1 ]
机构
[1] 天津大学数学学院
关键词
深度学习; 二阶优化方法; K-FAC算法; Sherman-Morrison公式; Fisher信息矩阵;
D O I
10.15888/j.cnki.csa.007869
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
二阶优化方法可以加速深度神经网络的训练,但是二阶优化方法巨大的计算成本使其在实际中难以被应用.因此,近些年的研究提出了许多近似二阶优化方法的算法. K-FAC算法提供了一种近似自然梯度的有效方法.在K-FAC算法的基础上,结合拟牛顿方法的思想,提出了一种改进的K-FAC算法.在开始的少量迭代中利用KFAC算法计算,在后续迭代中构造秩–1矩阵,通过Sherman-Morrison公式进行计算,大大降低了计算复杂度.实验结果表明,改进的K-FAC算法比K-FAC算法有相似甚至是更好的实验表现.特别的,改进的K-FAC算法与KFAC算法相比减少了大量的训练时间,而且与一阶优化方法相比,在训练时间上仍具有一定的优势.
引用
收藏
页码:118 / 124
页数:7
相关论文
共 10 条
  • [1] Convergence of Quasi-Newton Method for Fully Complex-Valued Neural Networks
    Xu, Dongpo
    Dong, Jian
    Zhang, Chengdong
    [J]. NEURAL PROCESSING LETTERS, 2017, 46 (03) : 961 - 968
  • [2] Riemannian metrics for neural networks I: feedforward networks[J] . Ollivier Yann. Information and Inference: A Journal of the IMA . 2015 (2)
  • [3] Optimizing Neural Networks with Kronecker-factored Approximate Curvature.[J] . James Martens,Roger B. Grosse. CoRR . 2015
  • [4] Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.[J] . John C. Duchi,Elad Hazan,Yoram Singer. Journal of Machine Learning Research . 2011
  • [5] On the momentum term in gradient descent learning algorithms
    Qian, N
    [J]. NEURAL NETWORKS, 1999, 12 (01) : 145 - 151
  • [6] Natural gradient works efficiently in learning
    Amari, S
    [J]. NEURAL COMPUTATION, 1998, 10 (02) : 251 - 276
  • [7] USE OF A QUASI-NEWTON METHOD IN A FEEDFORWARD NEURAL-NETWORK CONSTRUCTION ALGORITHM
    SETIONO, R
    HUI, LCK
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1995, 6 (01): : 273 - 277
  • [8] On the limited memory BFGS method for large scale optimization[J] . Dong C. Liu,Jorge Nocedal. Mathematical Programming . 1989 (1-3)
  • [9] Large-scale distributed second-order optimization using kronecker-factored approximate curvature for deep convolutional neural networks .2 Osawa K,Tsuji Y,Ueno Y,et al. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2019
  • [10] Learning multiple layers of features from tiny images .2 Krizhevsky,A. University ofToronto . 2009