Accelerating distributed deep neural network training with pipelined MPI allreduce

被引：6

作者：

Castello, Adrian ^{[1
]}

Quintana-Orti, Enrique S. ^{[1
]}

Duato, Jose ^{[1
]}

机构：

[1] Univ Politecn Valencia, Valencia, Spain

来源：

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2021年 / 24卷 / 04期

关键词：

Message Passing Interface (MPI); Collective communication primitives; Allreduce; Deep learning; Distributed training; COLLECTIVE COMMUNICATION;

D O I：

10.1007/s10586-021-03370-9

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

TensorFlow (TF) is usually combined with the Horovod (HVD) workload distribution package to obtain a parallel tool to train deep neural network on clusters of computers. HVD in turn utilizes a blocking Allreduce primitive to share information among processes, combined with a communication thread to overlap communication with computation. In this work, we perform a thorough experimental analysis to expose (1) the importance of selecting the best algorithm in MPI libraries to realize the Allreduce operation; and (2) the performance acceleration that can be attained when replacing a blocking Allreduce with its non-blocking counterpart (while maintaining the blocking behaviour via the appropriate synchronization mechanism). Furthermore, (3) we explore the benefits of applying pipelining to the communication exchange, demonstrating that these improvements carry over to distributed training via TF+HVD. Finally, (4) we show that pipelining can also boost performance for applications that make heavy use of other collectives, such as Broadcast and Reduce-Scatter.

引用

页码：3797 / 3813

页数：17

共 50 条

[41] Accelerating CEST imaging using a model-based deep neural network with synthetic training data
Xu, Jianping
Zu, Tao
Hsu, Yi-Cheng
Wang, Xiaoli
Chan, Kannie W. Y.
Zhang, Yi
[J]. MAGNETIC RESONANCE IN MEDICINE, 2023, : 583 - 599
[42] Distributed Framework for Accelerating Training of Deep Learning Models through Prioritization
Zhou, Tian
Gao, Lixin
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING, IC2E 2021, 2021, : 201 - 209
[43] Survey on Network of Distributed Deep Learning Training
Zhu, Hongrui
Yuan, Guojun
Yao, Chengji
Tan, Guangming
Wang, Zhan
Hu, Zhongzhe
Zhang, Xiaoyang
An, Xuejun
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (01): : 98 - 115
[44] Distributed Graph Neural Network Training: A Survey
Shao, Yingxia
Li, Hongzheng
Gu, Xizhi
Yin, Hongbo
Li, Yawen
Miao, Xupeng
Zhang, Wentao
Cui, Bin
Chen, Lei
[J]. ACM COMPUTING SURVEYS, 2024, 56 (08)
[45] NeuralGenesis: a software for distributed neural network training
Tsoulos, Ioannis
Tzallas, Alexandros T.
Tsalikakis, Dimitrios G.
Giannakeas, Nikolaos
Tsipouras, Markos G.
Androulidakis, Iosif
Zaitseva, Elena
[J]. 2016 24TH TELECOMMUNICATIONS FORUM (TELFOR), 2016, : 841 - 844
[46] Efficient MPI-AllReduce for large-scale deep learning on GPU-clusters
Truong Thao Nguyen
Wahib, Mohamed
Takano, Ryousei
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (12):
[47] GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training
Cai, Tianle
Luo, Shengjie
Xu, Keyulu
He, Di
Liu, Tie-Yan
Wang, Liwei
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[48] fuseGNN: Accelerating Graph Convolutional Neural Network Training on GPGPU
Chen, Zhaodong
Yan, Mingyu
Zhu, Maohua
Deng, Lei
Li, Guoqi
Li, Shuangchen
Xie, Yuan
[J]. 2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
[49] Accelerating Neural Network Training with Processing-in-Memory GPU
Fei, Xiang
Han, Jianhui
Huang, Jianqiang
Zheng, Weimin
Zhang, Youhui
[J]. 2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 414 - 421
[50] DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices
Li, Dawei
Wang, Xiaolong
Kong, Deguang
[J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2322 - 2330

← 1 2 3 4 5 →