Accelerating distributed deep neural network training with pipelined MPI allreduce

被引:6
|
作者
Castello, Adrian [1 ]
Quintana-Orti, Enrique S. [1 ]
Duato, Jose [1 ]
机构
[1] Univ Politecn Valencia, Valencia, Spain
关键词
Message Passing Interface (MPI); Collective communication primitives; Allreduce; Deep learning; Distributed training; COLLECTIVE COMMUNICATION;
D O I
10.1007/s10586-021-03370-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
TensorFlow (TF) is usually combined with the Horovod (HVD) workload distribution package to obtain a parallel tool to train deep neural network on clusters of computers. HVD in turn utilizes a blocking Allreduce primitive to share information among processes, combined with a communication thread to overlap communication with computation. In this work, we perform a thorough experimental analysis to expose (1) the importance of selecting the best algorithm in MPI libraries to realize the Allreduce operation; and (2) the performance acceleration that can be attained when replacing a blocking Allreduce with its non-blocking counterpart (while maintaining the blocking behaviour via the appropriate synchronization mechanism). Furthermore, (3) we explore the benefits of applying pipelining to the communication exchange, demonstrating that these improvements carry over to distributed training via TF+HVD. Finally, (4) we show that pipelining can also boost performance for applications that make heavy use of other collectives, such as Broadcast and Reduce-Scatter.
引用
收藏
页码:3797 / 3813
页数:17
相关论文
共 50 条
  • [31] Evaluating Multi-Level Checkpointing for Distributed Deep Neural Network Training
    Anthony, Quentin
    Dai, Donglai
    [J]. SCWS 2021: 2021 SC WORKSHOPS SUPPLEMENTARY PROCEEDINGS, 2021, : 60 - 67
  • [32] Performance Modeling and Analysis of Distributed Deep Neural Network Training with Parameter Server
    Zhang, Xuan
    Zhang, Jiao
    Wei, Dehui
    Pan, Tian
    Huang, Tao
    [J]. IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 4140 - 4145
  • [33] Distributed Deep Learning Framework based on Shared Memory for Fast Deep Neural Network Training
    Lim, Eun-Ji
    Ahn, Shin-Young
    Park, Yoo-Mi
    Choi, Wan
    [J]. 2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1239 - 1242
  • [34] FPRaker: A Processing Element For Accelerating Neural Network Training
    Awad, Omar Mohamed
    Mahmoud, Mostafa
    Edo, Isak
    Zadeh, Ali Hadi
    Bannon, Ciaran
    Jayarajan, Anand
    Pekhimenko, Gennady
    Moshovos, Andreas
    [J]. PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 857 - 869
  • [35] Accelerating neural network training using weight extrapolations
    Kamarthi, SV
    Pittner, S
    [J]. NEURAL NETWORKS, 1999, 12 (09) : 1285 - 1299
  • [36] Near-Optimal Sparse Allreduce for Distributed Deep Learning
    Li, Shigang
    Hoefler, Torsten
    [J]. PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 135 - 149
  • [37] Accelerating the Deep Reinforcement Learning with Neural Network Compression
    Zhang, Hongjie
    He, Zhuocheng
    Li, Jing
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [38] Centered Weight Normalization in Accelerating Training of Deep Neural Networks
    Huang, Lei
    Liu, Xianglong
    Liu, Yang
    Lang, Bo
    Tao, Dacheng
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2822 - 2830
  • [39] Visualization in Deep Neural Network Training
    Kollias, Stefanos
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2022, 31 (03)
  • [40] LEARNAE: Distributed and Resilient Deep Neural Network Training for Heterogeneous Peer to Peer Topologies
    Nikolaidis, Spyridon
    Refanidis, Ioannis
    [J]. ENGINEERING APPLICATIONS OF NEURAL NETWORKSX, 2019, 1000 : 286 - 298