Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning

被引:0
|
作者
Zhang, Lin [1 ]
Shi, Shaohuai [2 ]
Wang, Wei [1 ]
Li, Bo [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
关键词
Training; Computational modeling; Clustering algorithms; Graphics processing units; Memory management; Deep learning; Convergence; Distributed deep learning; K-FAC; performance optimization; second-order; NATURAL GRADIENT;
D O I
10.1109/TCC.2022.3205918
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The second-order optimization methods, notably the D-KFAC (Distributed Kronecker Factored Approximate Curvature) algorithms, have gained traction on accelerating deep neural network (DNN) training on GPU clusters. However, existing D-KFAC algorithms require to compute and communicate a large volume of second-order information, i.e., Kronecker factors (KFs), before preconditioning gradients, resulting in large computation and communication overheads as well as a high memory footprint. In this article, we propose DP-KFAC, a novel distributed preconditioning scheme that distributes the KF constructing tasks at different DNN layers to different workers. DP-KFAC not only retains the convergence property of the existing D-KFAC algorithms but also enables three benefits: reduced computation overhead in constructing KFs, no communication of KFs, and low memory footprint. Extensive experiments on a 64-GPU cluster show that DP-KFAC reduces the computation overhead by 1.55x-1.65x, the communication cost by 2.79x-3.15x, and the memory footprint by 1.14x-1.47x in each second-order update compared to the state-of-the-art D-KFAC methods.
引用
收藏
页码:2365 / 2378
页数:14
相关论文
共 50 条
  • [1] Deep Neural Network Training With Distributed K-FAC
    Pauloski, J. Gregory
    Huang, Lei
    Xu, Weijia
    Chard, Kyle
    Foster, Ian T.
    Zhang, Zhao
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3616 - 3627
  • [2] Convolutional Neural Network Training with Distributed K-FAC
    Pauloski, J. Gregory
    Zhang, Zhao
    Huang, Lei
    Xu, Weijia
    Foster, Ian T.
    [J]. PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
  • [3] Inefficiency of K-FAC for Large Batch Size Training
    Ma, Linjian
    Montague, Gabe
    Ye, Jiayu
    Yao, Zhewei
    Gholami, Amir
    Keutzer, Kurt
    Mahoney, Michael W.
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5053 - 5060
  • [4] Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
    Shi, Shaohuai
    Zhang, Lin
    Li, Bo
    [J]. 2021 IEEE 41ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2021), 2021, : 550 - 560
  • [5] Scalable Data Parallel Distributed Training for Graph Neural Networks
    Koyama, Sohei
    Tatebe, Osamu
    [J]. 2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 699 - 707
  • [6] Partitioning Sparse Deep Neural Networks for Scalable Training and Inference
    Demirci, Gunduz Vehbi
    Ferhatosmanoglu, Hakan
    [J]. PROCEEDINGS OF THE 2021 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2021, 2021, : 254 - 265
  • [7] Accelerating Training for Distributed Deep Neural Networks in MapReduce
    Xu, Jie
    Wang, Jingyu
    Qi, Qi
    Sun, Haifeng
    Liao, Jianxin
    [J]. WEB SERVICES - ICWS 2018, 2018, 10966 : 181 - 195
  • [8] DistGNN: Scalable Distributed Training for Large -Scale Graph Neural Networks
    Md, Vasimuddin
    Misra, Sanchit
    Ma, Guixiang
    Mohanty, Ramanarayan
    Georganas, Evangelos
    Heinecke, Alexander
    Kalamkar, Dhiraj
    Ahmed, Nesreen K.
    Avancha, Sasikanth
    [J]. SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
  • [9] A Scalable GPU-enabled Framework for Training Deep Neural Networks
    Del Monte, Bonaventura
    Prodan, Radu
    [J]. 2016 2ND INTERNATIONAL CONFERENCE ON GREEN HIGH PERFORMANCE COMPUTING (ICGHPC), 2016,
  • [10] Scalable bio-inspired training of Deep Neural Networks with FastHebb
    Lagani, Gabriele
    Falchi, Fabrizio
    Gennaro, Claudio
    Fassold, Hannes
    Amato, Giuseppe
    [J]. NEUROCOMPUTING, 2024, 595