Accelerating distributed deep neural network training with pipelined MPI allreduce

被引：6

作者：

Castello, Adrian ^{[1
]}

Quintana-Orti, Enrique S. ^{[1
]}

Duato, Jose ^{[1
]}

机构：

[1] Univ Politecn Valencia, Valencia, Spain

来源：

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2021年 / 24卷 / 04期

关键词：

Message Passing Interface (MPI); Collective communication primitives; Allreduce; Deep learning; Distributed training; COLLECTIVE COMMUNICATION;

D O I：

10.1007/s10586-021-03370-9

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

TensorFlow (TF) is usually combined with the Horovod (HVD) workload distribution package to obtain a parallel tool to train deep neural network on clusters of computers. HVD in turn utilizes a blocking Allreduce primitive to share information among processes, combined with a communication thread to overlap communication with computation. In this work, we perform a thorough experimental analysis to expose (1) the importance of selecting the best algorithm in MPI libraries to realize the Allreduce operation; and (2) the performance acceleration that can be attained when replacing a blocking Allreduce with its non-blocking counterpart (while maintaining the blocking behaviour via the appropriate synchronization mechanism). Furthermore, (3) we explore the benefits of applying pipelining to the communication exchange, demonstrating that these improvements carry over to distributed training via TF+HVD. Finally, (4) we show that pipelining can also boost performance for applications that make heavy use of other collectives, such as Broadcast and Reduce-Scatter.

引用

页码：3797 / 3813

页数：17

共 50 条

[31] Evaluating Multi-Level Checkpointing for Distributed Deep Neural Network Training
Anthony, Quentin
Dai, Donglai
[J]. SCWS 2021: 2021 SC WORKSHOPS SUPPLEMENTARY PROCEEDINGS, 2021, : 60 - 67
[32] Performance Modeling and Analysis of Distributed Deep Neural Network Training with Parameter Server
Zhang, Xuan
Zhang, Jiao
Wei, Dehui
Pan, Tian
Huang, Tao
[J]. IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 4140 - 4145
[33] Distributed Deep Learning Framework based on Shared Memory for Fast Deep Neural Network Training
Lim, Eun-Ji
Ahn, Shin-Young
Park, Yoo-Mi
Choi, Wan
[J]. 2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1239 - 1242
[34] FPRaker: A Processing Element For Accelerating Neural Network Training
Awad, Omar Mohamed
Mahmoud, Mostafa
Edo, Isak
Zadeh, Ali Hadi
Bannon, Ciaran
Jayarajan, Anand
Pekhimenko, Gennady
Moshovos, Andreas
[J]. PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 857 - 869
[35] Accelerating neural network training using weight extrapolations
Kamarthi, SV
Pittner, S
[J]. NEURAL NETWORKS, 1999, 12 (09) : 1285 - 1299
[36] Near-Optimal Sparse Allreduce for Distributed Deep Learning
Li, Shigang
Hoefler, Torsten
[J]. PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 135 - 149
[37] Accelerating the Deep Reinforcement Learning with Neural Network Compression
Zhang, Hongjie
He, Zhuocheng
Li, Jing
[J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[38] Centered Weight Normalization in Accelerating Training of Deep Neural Networks
Huang, Lei
Liu, Xianglong
Liu, Yang
Lang, Bo
Tao, Dacheng
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2822 - 2830
[39] Visualization in Deep Neural Network Training
Kollias, Stefanos
[J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2022, 31 (03)
[40] LEARNAE: Distributed and Resilient Deep Neural Network Training for Heterogeneous Peer to Peer Topologies
Nikolaidis, Spyridon
Refanidis, Ioannis
[J]. ENGINEERING APPLICATIONS OF NEURAL NETWORKSX, 2019, 1000 : 286 - 298

← 1 2 3 4 5 →