Convolutional Neural Network Training with Distributed K-FAC

被引：6

作者：

Pauloski, J. Gregory ^{[2
]}

Zhang, Zhao ^{[1
]}

Huang, Lei ^{[1
]}

Xu, Weijia ^{[1
]}

Foster, Ian T. ^{[3
,4
]}

机构：

[1] Texas Adv Comp Ctr, Austin, TX 78758 USA

[2] Univ Texas Austin, Austin, TX 78712 USA

[3] Univ Chicago, Chicago, IL 60637 USA

[4] Argonne Natl Lab, Argonne, IL 60439 USA

来源：

PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20) | 2020年

关键词：

optimization methods; neural networks; scalability; high performance computing; OPTIMIZATION;

D O I：

10.1109/SC41405.2020.00098

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation of the Fisher Information Matrix that can be used in natural gradient optimizers. We investigate here a scalable K-FAC design and its applicability in convolutional neural network (CNN) training at scale. We study optimization techniques such as layer-wise distribution strategies, inverse-free second-order gradient evaluation, and dynamic K-FAC update decoupling to reduce training lime while preserving convergence. We use residual neural networks (ResNet) applied to the CIFAR-10 and ImageNet-I k datasets to evaluate the correctness and scalability of our K-FAC gradient preconditioner. With ResNet-50 on the ImageNet- lk dataset, our distributed K-FAC implementation converges to the 75.9% MLPerf baseline in 18-25% less time than does the classic stochastic gradient descent (SGI)) optimizer across scales on a GPI) cluster.

引用

页数：12

共 50 条

[1] Deep Neural Network Training With Distributed K-FAC
Pauloski, J. Gregory
Huang, Lei
Xu, Weijia
Chard, Kyle
Foster, Ian T.
Zhang, Zhao
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3616 - 3627
[2] Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning
Zhang, Lin
Shi, Shaohuai
Wang, Wei
Li, Bo
[J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) : 2365 - 2378
[3] Inefficiency of K-FAC for Large Batch Size Training
Ma, Linjian
Montague, Gabe
Ye, Jiayu
Yao, Zhewei
Gholami, Amir
Keutzer, Kurt
Mahoney, Michael W.
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5053 - 5060
[4] Optimizing Q-Learning with K-FAC AlgorithmOptimizing Q-Learning with K-FAC Algorithm
Beltiukov, Roman
[J]. ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS (AIST 2019), 2020, 1086 : 3 - 8
[5] Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
Shi, Shaohuai
Zhang, Lin
Li, Bo
[J]. 2021 IEEE 41ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2021), 2021, : 550 - 560
[6] 基于Sherman-Morrison公式的K-FAC算法
刘小雷
高凯新
王勇
[J]. 计算机系统应用, 2021, 30 (04) : 118 - 124
[7] Randomized K-FACs: Speeding Up K-FAC with Randomized Numerical Linear Algebra
Puiu, Constantin Octavian
[J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2022, 2022, 13756 : 411 - 422
[8] Performance Modeling for Distributed Training of Convolutional Neural Networks
Castello, Adrian
Catalan, Mar
Dolz, Manuel F.
Mestre, Jose, I
Quintana-Orti, Enrique S.
Duato, Jose
[J]. 2021 29TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2021), 2021, : 99 - 108
[9] A novel training algorithm for convolutional neural network
Anuse, Alwin
Vyas, Vibha
[J]. COMPLEX & INTELLIGENT SYSTEMS, 2016, 2 (03) : 221 - 234
[10] A novel training algorithm for convolutional neural network
Alwin Anuse
Vibha Vyas
[J]. Complex & Intelligent Systems, 2016, 2 (3) : 221 - 234

← 1 2 3 4 5 →