Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning

被引：0

作者：

Zhang, Lin ^{[1
]}

Shi, Shaohuai ^{[2
]}

Wang, Wei ^{[1
]}

Li, Bo ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China

来源：

IEEE TRANSACTIONS ON CLOUD COMPUTING | 2023年 / 11卷 / 03期

关键词：

Training; Computational modeling; Clustering algorithms; Graphics processing units; Memory management; Deep learning; Convergence; Distributed deep learning; K-FAC; performance optimization; second-order; NATURAL GRADIENT;

D O I：

10.1109/TCC.2022.3205918

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The second-order optimization methods, notably the D-KFAC (Distributed Kronecker Factored Approximate Curvature) algorithms, have gained traction on accelerating deep neural network (DNN) training on GPU clusters. However, existing D-KFAC algorithms require to compute and communicate a large volume of second-order information, i.e., Kronecker factors (KFs), before preconditioning gradients, resulting in large computation and communication overheads as well as a high memory footprint. In this article, we propose DP-KFAC, a novel distributed preconditioning scheme that distributes the KF constructing tasks at different DNN layers to different workers. DP-KFAC not only retains the convergence property of the existing D-KFAC algorithms but also enables three benefits: reduced computation overhead in constructing KFs, no communication of KFs, and low memory footprint. Extensive experiments on a 64-GPU cluster show that DP-KFAC reduces the computation overhead by 1.55x-1.65x, the communication cost by 2.79x-3.15x, and the memory footprint by 1.14x-1.47x in each second-order update compared to the state-of-the-art D-KFAC methods.

引用

页码：2365 / 2378

页数：14

共 50 条

[1] Deep Neural Network Training With Distributed K-FAC
Pauloski, J. Gregory
Huang, Lei
Xu, Weijia
Chard, Kyle
Foster, Ian T.
Zhang, Zhao
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3616 - 3627
[2] Convolutional Neural Network Training with Distributed K-FAC
Pauloski, J. Gregory
Zhang, Zhao
Huang, Lei
Xu, Weijia
Foster, Ian T.
[J]. PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
[3] Inefficiency of K-FAC for Large Batch Size Training
Ma, Linjian
Montague, Gabe
Ye, Jiayu
Yao, Zhewei
Gholami, Amir
Keutzer, Kurt
Mahoney, Michael W.
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5053 - 5060
[4] Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
Shi, Shaohuai
Zhang, Lin
Li, Bo
[J]. 2021 IEEE 41ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2021), 2021, : 550 - 560
[5] Scalable Data Parallel Distributed Training for Graph Neural Networks
Koyama, Sohei
Tatebe, Osamu
[J]. 2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 699 - 707
[6] Partitioning Sparse Deep Neural Networks for Scalable Training and Inference
Demirci, Gunduz Vehbi
Ferhatosmanoglu, Hakan
[J]. PROCEEDINGS OF THE 2021 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2021, 2021, : 254 - 265
[7] Accelerating Training for Distributed Deep Neural Networks in MapReduce
Xu, Jie
Wang, Jingyu
Qi, Qi
Sun, Haifeng
Liao, Jianxin
[J]. WEB SERVICES - ICWS 2018, 2018, 10966 : 181 - 195
[8] DistGNN: Scalable Distributed Training for Large -Scale Graph Neural Networks
Md, Vasimuddin
Misra, Sanchit
Ma, Guixiang
Mohanty, Ramanarayan
Georganas, Evangelos
Heinecke, Alexander
Kalamkar, Dhiraj
Ahmed, Nesreen K.
Avancha, Sasikanth
[J]. SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
[9] A Scalable GPU-enabled Framework for Training Deep Neural Networks
Del Monte, Bonaventura
Prodan, Radu
[J]. 2016 2ND INTERNATIONAL CONFERENCE ON GREEN HIGH PERFORMANCE COMPUTING (ICGHPC), 2016,
[10] Scalable bio-inspired training of Deep Neural Networks with FastHebb
Lagani, Gabriele
Falchi, Fabrizio
Gennaro, Claudio
Fassold, Hannes
Amato, Giuseppe
[J]. NEUROCOMPUTING, 2024, 595

← 1 2 3 4 5 →