Performance Modeling for Distributed Training of Convolutional Neural Networks

被引:2
|
作者
Castello, Adrian [1 ]
Catalan, Mar [1 ]
Dolz, Manuel F. [1 ]
Mestre, Jose, I [1 ]
Quintana-Orti, Enrique S. [2 ]
Duato, Jose [2 ]
机构
[1] Univ Jaume 1, Castellon de La Plana, Spain
[2] Univ Politecn Valencia, Valencia, Spain
关键词
Deep neural networks (DNNs); distributed training; analytical modeling; clusters; COLLECTIVE COMMUNICATION;
D O I
10.1109/PDP52278.2021.00024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We perform a theoretical analysis comparing the scalability of data versus model parallelism, applied to the distributed training of deep convolutional neural networks (CNNs), along live axes: batch size, node (floating-point) arithmetic performance, node memory bandwidth, network link bandwidth, and cluster dimension. Our study relies on analytical performance models that can he configured to reproduce the components and organization of the CNN model as well as the hardware configuration of the target distributed platform. In addition, we provide evidence of the accuracy of the analytical models by performing a validation against a Python library for distributed deep learning training.
引用
收藏
页码:99 / 108
页数:10
相关论文
共 50 条
  • [1] Evaluation of MPI Allreduce for Distributed Training of Convolutional Neural Networks
    Castello, Adrian
    Catalan, Mar
    Dolz, Manuel F.
    Mestre, Jose, I
    Quintana-Orti, Enrique S.
    Duato, Jose
    [J]. 2021 29TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2021), 2021, : 109 - 116
  • [2] Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks
    Adrián Castelló
    Mar Catalán
    Manuel F. Dolz
    Enrique S. Quintana-Ortí
    José Duato
    [J]. Computing, 2023, 105 : 1101 - 1119
  • [3] Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks
    Castello, Adrian
    Catalan, Mar
    Dolz, Manuel F.
    Quintana-Orti, Enrique S.
    Duato, Jose
    [J]. COMPUTING, 2023, 105 (05) : 1101 - 1119
  • [4] Distributed Training of Graph Convolutional Networks
    Scardapane, Simone
    Spinelli, Indro
    Di Lorenzo, Paolo
    [J]. IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2021, 7 : 87 - 100
  • [5] Latent Training for Convolutional Neural Networks
    Huang, Zi
    Liu, Qi
    Chen, Zhiyuan
    Zhao, Yuming
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ESTIMATION, DETECTION AND INFORMATION FUSION ICEDIF 2015, 2015, : 55 - 60
  • [6] Distributed Information Integration in Convolutional Neural Networks
    Kumar, Dinesh
    Sharma, Dharmendra
    [J]. PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 491 - 498
  • [7] Deep distributed convolutional neural networks: Universality
    Zhou, Ding-Xuan
    [J]. ANALYSIS AND APPLICATIONS, 2018, 16 (06) : 895 - 919
  • [8] Generative modeling of convolutional neural networks
    Dai, Jifeng
    Lu, Yang
    Wu, Ying Nian
    [J]. STATISTICS AND ITS INTERFACE, 2016, 9 (04) : 485 - 496
  • [9] Distributed Asynchronous Optimization of Convolutional Neural Networks
    Chan, William
    Lane, Ian
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1073 - 1077
  • [10] JOINT TRAINING OF CONVOLUTIONAL AND NON-CONVOLUTIONAL NEURAL NETWORKS
    Soltau, Hagen
    Saon, George
    Sainath, Tara N.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,