Performance Modeling for Distributed Training of Convolutional Neural Networks

被引:2
|
作者
Castello, Adrian [1 ]
Catalan, Mar [1 ]
Dolz, Manuel F. [1 ]
Mestre, Jose, I [1 ]
Quintana-Orti, Enrique S. [2 ]
Duato, Jose [2 ]
机构
[1] Univ Jaume 1, Castellon de La Plana, Spain
[2] Univ Politecn Valencia, Valencia, Spain
关键词
Deep neural networks (DNNs); distributed training; analytical modeling; clusters; COLLECTIVE COMMUNICATION;
D O I
10.1109/PDP52278.2021.00024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We perform a theoretical analysis comparing the scalability of data versus model parallelism, applied to the distributed training of deep convolutional neural networks (CNNs), along live axes: batch size, node (floating-point) arithmetic performance, node memory bandwidth, network link bandwidth, and cluster dimension. Our study relies on analytical performance models that can he configured to reproduce the components and organization of the CNN model as well as the hardware configuration of the target distributed platform. In addition, we provide evidence of the accuracy of the analytical models by performing a validation against a Python library for distributed deep learning training.
引用
收藏
页码:99 / 108
页数:10
相关论文
共 50 条
  • [31] Predicting classification performance of convolutional neural networks
    Dai, Mizuki
    Jin'no, Kenya
    IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2024, 15 (02): : 443 - 458
  • [32] Application and Performance of Convolutional Neural Networks to SAR
    Fox, Maxine R.
    Narayanan, Ram M.
    RADAR SENSOR TECHNOLOGY XXII, 2018, 10633
  • [33] On the Size of Convolutional Neural Networks and Generalization Performance
    Kabkab, Maya
    Hand, Emily
    Chellappa, Rama
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3572 - 3577
  • [34] Convolutional Neural Networks for Optical Performance Monitoring
    Cho, Hyung Joon
    Lippiatt, Daniel
    Varughese, Siddharth
    Ralph, Stephen E.
    2019 IEEE AVIONICS AND VEHICLE FIBER-OPTICS AND PHOTONICS CONFERENCE (AVFOP 2019), 2019,
  • [35] Performance modeling of the sparse matrix-vector product via convolutional neural networks
    Barreda, Maria
    Dolz, Manuel F.
    Castano, M. Asuncion
    Alonso-Jorda, Pedro
    Quintana-Orti, Enrique S.
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (11): : 8883 - 8900
  • [36] Distributed Deep Convolutional Neural Networks for the Internet-of-Things
    Disabato, Simone
    Roveri, Manuel
    Alippi, Cesare
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (08) : 1239 - 1252
  • [37] Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks
    Castello, Adrian
    Dolz, Manuel F.
    Quintana-Orti, Enrique S.
    Duato, Jose
    2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 534 - 541
  • [38] Convolutional Neural Networks for estimating spatially-distributed evapotranspiration
    Garcia-Pedrero, Angel
    Gonzalo-Martin, Consuelo
    Lillo-Saavedra, Mario F.
    Rodriguez-Esparragon, Dionisio
    Menasalvas, Ernestina
    IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXIII, 2017, 10427
  • [39] Analysis of Stable Diffusion-derived fake weeds performance for training Convolutional Neural Networks
    Moreno, Hugo
    Gomez, Adria
    Altares-Lopez, Sergio
    Ribeiro, Angela
    Andujar, Dionisio
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2023, 214
  • [40] Nonlinear System Modeling using Convolutional Neural Networks
    Lopez, Mario
    Yu, Wen
    2017 14TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, COMPUTING SCIENCE AND AUTOMATIC CONTROL (CCE), 2017,