Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

被引:0
|
作者
Adrián Castelló
Mar Catalán
Manuel F. Dolz
Enrique S. Quintana-Ortí
José Duato
机构
[1] Universitat Politècnica de València,
[2] Universitat Jaume I,undefined
来源
Computing | 2023年 / 105卷
关键词
Message passing interface (MPI); Collective communication primitives; Allreduce; Deep learning; Distributed training; 6804;
D O I
暂无
中图分类号
学科分类号
摘要
For many distributed applications, data communication poses an important bottleneck from the points of view of performance and energy consumption. As more cores are integrated per node, in general the global performance of the system increases yet eventually becomes limited by the interconnection network. This is the case for distributed data-parallel training of convolutional neural networks (CNNs), which usually proceeds on a cluster with a small to moderate number of nodes. In this paper, we analyze the performance of the Allreduce collective communication primitive, a key to the efficient data-parallel distributed training of CNNs. Our study targets the distinct realizations of this primitive in three high performance instances of Message Passing Interface (MPI), namely MPICH, OpenMPI, and IntelMPI, and employs a cluster equipped with state-of-the-art processor and network technologies. In addition, we apply the insights gained from the experimental analysis to the optimization of the TensorFlow framework when running on top of Horovod. Our study reveals that a careful selection of the most convenient MPI library and Allreduce (ARD) realization accelerates the training throughput by a factor of 1.2×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.2\times $$\end{document} compared with the default algorithm in the same MPI library, and up to 2.8×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2.8\times $$\end{document} when comparing distinct MPI libraries in a number of relevant combinations of CNN model+dataset.
引用
收藏
页码:1101 / 1119
页数:18
相关论文
共 50 条
  • [1] Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks
    Castello, Adrian
    Catalan, Mar
    Dolz, Manuel F.
    Quintana-Orti, Enrique S.
    Duato, Jose
    [J]. COMPUTING, 2023, 105 (05) : 1101 - 1119
  • [2] Evaluation of MPI Allreduce for Distributed Training of Convolutional Neural Networks
    Castello, Adrian
    Catalan, Mar
    Dolz, Manuel F.
    Mestre, Jose, I
    Quintana-Orti, Enrique S.
    Duato, Jose
    [J]. 2021 29TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2021), 2021, : 109 - 116
  • [3] Accelerating distributed deep neural network training with pipelined MPI allreduce
    Castello, Adrian
    Quintana-Orti, Enrique S.
    Duato, Jose
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2021, 24 (04): : 3797 - 3813
  • [4] Accelerating distributed deep neural network training with pipelined MPI allreduce
    Adrián Castelló
    Enrique S. Quintana-Ortí
    José Duato
    [J]. Cluster Computing, 2021, 24 : 3797 - 3813
  • [5] Performance Modeling for Distributed Training of Convolutional Neural Networks
    Castello, Adrian
    Catalan, Mar
    Dolz, Manuel F.
    Mestre, Jose, I
    Quintana-Orti, Enrique S.
    Duato, Jose
    [J]. 2021 29TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2021), 2021, : 99 - 108
  • [6] Distributed Training of Graph Convolutional Networks
    Scardapane, Simone
    Spinelli, Indro
    Di Lorenzo, Paolo
    [J]. IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2021, 7 : 87 - 100
  • [7] Latent Training for Convolutional Neural Networks
    Huang, Zi
    Liu, Qi
    Chen, Zhiyuan
    Zhao, Yuming
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ESTIMATION, DETECTION AND INFORMATION FUSION ICEDIF 2015, 2015, : 55 - 60
  • [8] Performance Characterization of MPI_Allreduce in Cloud Data Center Networks
    Musleh, Malek
    Alemania, Allister
    Penaranda, Roberto
    Segura, Pedro Yebenes
    [J]. 29TH INTERNATIONAL SYMPOSIUM ON THE MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2021), 2021, : 57 - 64
  • [9] Analyzing the Impact of Image Denoising and Segmentation on Melanoma Classification Using Convolutional Neural Networks
    Kaur, R.
    GholamHosseini, H.
    [J]. 2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [10] Distributed Information Integration in Convolutional Neural Networks
    Kumar, Dinesh
    Sharma, Dharmendra
    [J]. PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 491 - 498