Accelerating distributed deep neural network training with pipelined MPI allreduce

被引:6
|
作者
Castello, Adrian [1 ]
Quintana-Orti, Enrique S. [1 ]
Duato, Jose [1 ]
机构
[1] Univ Politecn Valencia, Valencia, Spain
关键词
Message Passing Interface (MPI); Collective communication primitives; Allreduce; Deep learning; Distributed training; COLLECTIVE COMMUNICATION;
D O I
10.1007/s10586-021-03370-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
TensorFlow (TF) is usually combined with the Horovod (HVD) workload distribution package to obtain a parallel tool to train deep neural network on clusters of computers. HVD in turn utilizes a blocking Allreduce primitive to share information among processes, combined with a communication thread to overlap communication with computation. In this work, we perform a thorough experimental analysis to expose (1) the importance of selecting the best algorithm in MPI libraries to realize the Allreduce operation; and (2) the performance acceleration that can be attained when replacing a blocking Allreduce with its non-blocking counterpart (while maintaining the blocking behaviour via the appropriate synchronization mechanism). Furthermore, (3) we explore the benefits of applying pipelining to the communication exchange, demonstrating that these improvements carry over to distributed training via TF+HVD. Finally, (4) we show that pipelining can also boost performance for applications that make heavy use of other collectives, such as Broadcast and Reduce-Scatter.
引用
收藏
页码:3797 / 3813
页数:17
相关论文
共 50 条
  • [11] Accelerating neural network training with distributed asynchronous and selective optimization (DASO)
    Daniel Coquelin
    Charlotte Debus
    Markus Götz
    Fabrice von der Lehr
    James Kahn
    Martin Siggel
    Achim Streit
    [J]. Journal of Big Data, 9
  • [12] Distributed Deep Neural Network Training on Edge Devices
    Benditkis, Daniel
    Keren, Aviv
    Mor-Yosef, Liron
    Avidor, Tomer
    Shoham, Neta
    Tal-Israel, Nadav
    [J]. SEC'19: PROCEEDINGS OF THE 4TH ACM/IEEE SYMPOSIUM ON EDGE COMPUTING, 2019, : 304 - 306
  • [13] Hierarchical Distributed-Memory Multi-Leader MPI-Allreduce for Deep Learning Workloads
    Truong Thao Nguyen
    Wahib, Mohamed
    Takano, Ryousei
    [J]. 2018 SIXTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING WORKSHOPS (CANDARW 2018), 2018, : 216 - 222
  • [14] Accelerating Allreduce With In-Network Reduction on Intel PIUMA
    Lakhotia, Kartik
    Petrini, Fabrizio
    Kannan, Rajgopal
    Prasanna, Viktor
    [J]. IEEE MICRO, 2022, 42 (02) : 44 - 52
  • [15] Accelerating deep neural network training for action recognition on a cluster of GPUs
    Cong, Guojing
    Domeniconi, Giacomo
    Shapiro, Joshua
    Zhou, Fan
    Chen, Barry
    [J]. 2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 298 - 305
  • [16] Accelerating deep neural network training with inconsistent stochastic gradient descent
    Wang, Linnan
    Yang, Yi
    Min, Renqiang
    Chakradhar, Srimat
    [J]. NEURAL NETWORKS, 2017, 93 : 219 - 229
  • [17] PipePar: A Pipelined Hybrid Parallel Approach for Accelerating Distributed DNN Training
    Li, Jiange
    Wang, Yuchen
    Zhang, Jinghui
    Jin, Jiahui
    Dong, Fang
    Qian, Lei
    [J]. PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 470 - 475
  • [18] Accelerating Large-Scale Distributed Neural Network Training with SPMD Parallelism
    Zhang, Shiwei
    Diao, Lansong
    Wu, Chuan
    Wang, Siyu
    Lin, Wei
    [J]. PROCEEDINGS OF THE 13TH SYMPOSIUM ON CLOUD COMPUTING, SOCC 2022, 2022, : 403 - 418
  • [19] Deep Neural Network Training With Distributed K-FAC
    Pauloski, J. Gregory
    Huang, Lei
    Xu, Weijia
    Chard, Kyle
    Foster, Ian T.
    Zhang, Zhao
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3616 - 3627
  • [20] High Performance Training of Deep Neural Networks Using Pipelined Hardware Acceleration and Distributed Memory
    Mehta, Ragav
    Huang, Yuyang
    Cheng, Mingxi
    Bagga, Shrey
    Mathur, Nishant
    Li, Ji
    Draper, Jeffrey
    Nazarian, Shahin
    [J]. 2018 19TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED), 2018, : 383 - 388