Convolutional Neural Network Training with Distributed K-FAC

被引:6
|
作者
Pauloski, J. Gregory [2 ]
Zhang, Zhao [1 ]
Huang, Lei [1 ]
Xu, Weijia [1 ]
Foster, Ian T. [3 ,4 ]
机构
[1] Texas Adv Comp Ctr, Austin, TX 78758 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
[3] Univ Chicago, Chicago, IL 60637 USA
[4] Argonne Natl Lab, Argonne, IL 60439 USA
关键词
optimization methods; neural networks; scalability; high performance computing; OPTIMIZATION;
D O I
10.1109/SC41405.2020.00098
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation of the Fisher Information Matrix that can be used in natural gradient optimizers. We investigate here a scalable K-FAC design and its applicability in convolutional neural network (CNN) training at scale. We study optimization techniques such as layer-wise distribution strategies, inverse-free second-order gradient evaluation, and dynamic K-FAC update decoupling to reduce training lime while preserving convergence. We use residual neural networks (ResNet) applied to the CIFAR-10 and ImageNet-I k datasets to evaluate the correctness and scalability of our K-FAC gradient preconditioner. With ResNet-50 on the ImageNet- lk dataset, our distributed K-FAC implementation converges to the 75.9% MLPerf baseline in 18-25% less time than does the classic stochastic gradient descent (SGI)) optimizer across scales on a GPI) cluster.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Deep Neural Network Training With Distributed K-FAC
    Pauloski, J. Gregory
    Huang, Lei
    Xu, Weijia
    Chard, Kyle
    Foster, Ian T.
    Zhang, Zhao
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3616 - 3627
  • [2] Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning
    Zhang, Lin
    Shi, Shaohuai
    Wang, Wei
    Li, Bo
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) : 2365 - 2378
  • [3] Inefficiency of K-FAC for Large Batch Size Training
    Ma, Linjian
    Montague, Gabe
    Ye, Jiayu
    Yao, Zhewei
    Gholami, Amir
    Keutzer, Kurt
    Mahoney, Michael W.
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5053 - 5060
  • [4] Optimizing Q-Learning with K-FAC AlgorithmOptimizing Q-Learning with K-FAC Algorithm
    Beltiukov, Roman
    [J]. ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS (AIST 2019), 2020, 1086 : 3 - 8
  • [5] Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
    Shi, Shaohuai
    Zhang, Lin
    Li, Bo
    [J]. 2021 IEEE 41ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2021), 2021, : 550 - 560
  • [6] 基于Sherman-Morrison公式的K-FAC算法
    刘小雷
    高凯新
    王勇
    [J]. 计算机系统应用, 2021, 30 (04) : 118 - 124
  • [7] Randomized K-FACs: Speeding Up K-FAC with Randomized Numerical Linear Algebra
    Puiu, Constantin Octavian
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2022, 2022, 13756 : 411 - 422
  • [8] Performance Modeling for Distributed Training of Convolutional Neural Networks
    Castello, Adrian
    Catalan, Mar
    Dolz, Manuel F.
    Mestre, Jose, I
    Quintana-Orti, Enrique S.
    Duato, Jose
    [J]. 2021 29TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2021), 2021, : 99 - 108
  • [9] A novel training algorithm for convolutional neural network
    Anuse, Alwin
    Vyas, Vibha
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2016, 2 (03) : 221 - 234
  • [10] A novel training algorithm for convolutional neural network
    Alwin Anuse
    Vibha Vyas
    [J]. Complex & Intelligent Systems, 2016, 2 (3) : 221 - 234