Accelerating distributed deep neural network training with pipelined MPI allreduce

被引：6

作者：

Castello, Adrian ^{[1
]}

Quintana-Orti, Enrique S. ^{[1
]}

Duato, Jose ^{[1
]}

机构：

[1] Univ Politecn Valencia, Valencia, Spain

来源：

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2021年 / 24卷 / 04期

关键词：

Message Passing Interface (MPI); Collective communication primitives; Allreduce; Deep learning; Distributed training; COLLECTIVE COMMUNICATION;

D O I：

10.1007/s10586-021-03370-9

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

TensorFlow (TF) is usually combined with the Horovod (HVD) workload distribution package to obtain a parallel tool to train deep neural network on clusters of computers. HVD in turn utilizes a blocking Allreduce primitive to share information among processes, combined with a communication thread to overlap communication with computation. In this work, we perform a thorough experimental analysis to expose (1) the importance of selecting the best algorithm in MPI libraries to realize the Allreduce operation; and (2) the performance acceleration that can be attained when replacing a blocking Allreduce with its non-blocking counterpart (while maintaining the blocking behaviour via the appropriate synchronization mechanism). Furthermore, (3) we explore the benefits of applying pipelining to the communication exchange, demonstrating that these improvements carry over to distributed training via TF+HVD. Finally, (4) we show that pipelining can also boost performance for applications that make heavy use of other collectives, such as Broadcast and Reduce-Scatter.

引用

页码：3797 / 3813

页数：17

共 50 条

[1] Accelerating distributed deep neural network training with pipelined MPI allreduce
Adrián Castelló
Enrique S. Quintana-Ortí
José Duato
[J]. Cluster Computing, 2021, 24 : 3797 - 3813
[2] Evaluation of MPI Allreduce for Distributed Training of Convolutional Neural Networks
Castello, Adrian
Catalan, Mar
Dolz, Manuel F.
Mestre, Jose, I
Quintana-Orti, Enrique S.
Duato, Jose
[J]. 2021 29TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2021), 2021, : 109 - 116
[3] Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks
Castello, Adrian
Catalan, Mar
Dolz, Manuel F.
Quintana-Orti, Enrique S.
Duato, Jose
[J]. COMPUTING, 2023, 105 (05) : 1101 - 1119
[4] Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks
Adrián Castelló
Mar Catalán
Manuel F. Dolz
Enrique S. Quintana-Ortí
José Duato
[J]. Computing, 2023, 105 : 1101 - 1119
[5] Accelerating Training for Distributed Deep Neural Networks in MapReduce
Xu, Jie
Wang, Jingyu
Qi, Qi
Sun, Haifeng
Liao, Jianxin
[J]. WEB SERVICES - ICWS 2018, 2018, 10966 : 181 - 195
[6] Accelerating Data Loading in Deep Neural Network Training
Yang, Chih-Chieh
Cong, Guojing
[J]. 2019 IEEE 26TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC), 2019, : 235 - 245
[7] PipeCompress: Accelerating Pipelined Communication for Distributed Deep Learning
Liu, Juncai
Wang, Jessie Hui
Rong, Chenghao
Wang, Jilong
[J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 207 - 212
[8] An Allreduce Algorithm and Network Co-design for Large-Scale Training of Distributed Deep Learning
Nguyen, Truong Thao
Wahib, Mohamed
[J]. 21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 396 - 405
[9] EmbRace: Accelerating Sparse Communication for Distributed Training of Deep Neural Networks
Li, Shengwei
Lai, Zhiquan
Li, Dongsheng
Zhang, Yiming
Ye, Xiangyu
Duan, Yabo
[J]. 51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
[10] Accelerating neural network training with distributed asynchronous and selective optimization (DASO)
Coquelin, Daniel
Debus, Charlotte
Goetz, Markus
von der Lehr, Fabrice
Kahn, James
Siggel, Martin
Streit, Achim
[J]. JOURNAL OF BIG DATA, 2022, 9 (01)

← 1 2 3 4 5 →