Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

被引:0
|
作者
Adrián Castelló
Mar Catalán
Manuel F. Dolz
Enrique S. Quintana-Ortí
José Duato
机构
[1] Universitat Politècnica de València,
[2] Universitat Jaume I,undefined
来源
Computing | 2023年 / 105卷
关键词
Message passing interface (MPI); Collective communication primitives; Allreduce; Deep learning; Distributed training; 6804;
D O I
暂无
中图分类号
学科分类号
摘要
For many distributed applications, data communication poses an important bottleneck from the points of view of performance and energy consumption. As more cores are integrated per node, in general the global performance of the system increases yet eventually becomes limited by the interconnection network. This is the case for distributed data-parallel training of convolutional neural networks (CNNs), which usually proceeds on a cluster with a small to moderate number of nodes. In this paper, we analyze the performance of the Allreduce collective communication primitive, a key to the efficient data-parallel distributed training of CNNs. Our study targets the distinct realizations of this primitive in three high performance instances of Message Passing Interface (MPI), namely MPICH, OpenMPI, and IntelMPI, and employs a cluster equipped with state-of-the-art processor and network technologies. In addition, we apply the insights gained from the experimental analysis to the optimization of the TensorFlow framework when running on top of Horovod. Our study reveals that a careful selection of the most convenient MPI library and Allreduce (ARD) realization accelerates the training throughput by a factor of 1.2×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.2\times $$\end{document} compared with the default algorithm in the same MPI library, and up to 2.8×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2.8\times $$\end{document} when comparing distinct MPI libraries in a number of relevant combinations of CNN model+dataset.
引用
收藏
页码:1101 / 1119
页数:18
相关论文
共 50 条
  • [21] Configuring and Optimizing of Convolutional Neural Networks for Analyzing the Structure of Metageosystems
    Yamashkin, S. A.
    Yamashkin, A. A.
    Kamaeva, A. A.
    Yamashkina, E. O.
    [J]. DATA SCIENCE AND ALGORITHMS IN SYSTEMS, 2022, VOL 2, 2023, 597 : 346 - 356
  • [22] Convolutional Neural Networks for Analyzing Unmanned Aerial Vehicles Sound
    Li, Shulin
    Kim, HyunJong
    Lee, Sukhoon
    Gallagher, John C.
    Kim, Daeun
    Park, SungWook
    Matson, Eric T.
    [J]. 2018 18TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2018, : 862 - 866
  • [23] A Study on the Impact of Data Augmentation for Training Convolutional Neural Networks in the Presence of Noisy Labels
    Pereira, Emeson
    Carneiro, Gustavo
    Cordeiro, Filipe R.
    [J]. 2022 35TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2022), 2022, : 25 - 30
  • [24] Convolutional Neural Network Training with Distributed K-FAC
    Pauloski, J. Gregory
    Zhang, Zhao
    Huang, Lei
    Xu, Weijia
    Foster, Ian T.
    [J]. PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
  • [25] Efficient Training of Convolutional Neural Nets on Large Distributed Systems
    Sreedhar, Dheeraj
    Saxena, Vaibhav
    Sabharwal, Yogish
    Verma, Ashish
    Kumar, Sameer
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 392 - 401
  • [26] Efficient Incremental Training for Deep Convolutional Neural Networks
    Tao, Yudong
    Tu, Yuexuan
    Shyu, Mei-Ling
    [J]. 2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 286 - 291
  • [27] Training Strategies for Convolutional Neural Networks with Transformed Input
    Khandani, Masoumeh Kalantari
    Mikhael, Wasfy B.
    [J]. 2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 1058 - 1061
  • [28] Privacy preserving distributed training of neural networks
    Nikolaidis, Spyridon
    Refanidis, Ioannis
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (23): : 17333 - 17350
  • [29] Privacy preserving distributed training of neural networks
    Spyridon Nikolaidis
    Ioannis Refanidis
    [J]. Neural Computing and Applications, 2020, 32 : 17333 - 17350
  • [30] A framework for parallel and distributed training of neural networks
    Scardapane, Simone
    Di Lorenzo, Paolo
    [J]. NEURAL NETWORKS, 2017, 91 : 42 - 54