Performance Modeling for Distributed Training of Convolutional Neural Networks

被引:2
|
作者
Castello, Adrian [1 ]
Catalan, Mar [1 ]
Dolz, Manuel F. [1 ]
Mestre, Jose, I [1 ]
Quintana-Orti, Enrique S. [2 ]
Duato, Jose [2 ]
机构
[1] Univ Jaume 1, Castellon de La Plana, Spain
[2] Univ Politecn Valencia, Valencia, Spain
关键词
Deep neural networks (DNNs); distributed training; analytical modeling; clusters; COLLECTIVE COMMUNICATION;
D O I
10.1109/PDP52278.2021.00024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We perform a theoretical analysis comparing the scalability of data versus model parallelism, applied to the distributed training of deep convolutional neural networks (CNNs), along live axes: batch size, node (floating-point) arithmetic performance, node memory bandwidth, network link bandwidth, and cluster dimension. Our study relies on analytical performance models that can he configured to reproduce the components and organization of the CNN model as well as the hardware configuration of the target distributed platform. In addition, we provide evidence of the accuracy of the analytical models by performing a validation against a Python library for distributed deep learning training.
引用
收藏
页码:99 / 108
页数:10
相关论文
共 50 条
  • [21] Training Strategies for Convolutional Neural Networks with Transformed Input
    Khandani, Masoumeh Kalantari
    Mikhael, Wasfy B.
    2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 1058 - 1061
  • [22] Efficient Incremental Training for Deep Convolutional Neural Networks
    Tao, Yudong
    Tu, Yuexuan
    Shyu, Mei-Ling
    2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 286 - 291
  • [23] Efficient Training of Convolutional Neural Nets on Large Distributed Systems
    Sreedhar, Dheeraj
    Saxena, Vaibhav
    Sabharwal, Yogish
    Verma, Ashish
    Kumar, Sameer
    2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 392 - 401
  • [24] Privacy preserving distributed training of neural networks
    Nikolaidis, Spyridon
    Refanidis, Ioannis
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (23): : 17333 - 17350
  • [25] Privacy preserving distributed training of neural networks
    Spyridon Nikolaidis
    Ioannis Refanidis
    Neural Computing and Applications, 2020, 32 : 17333 - 17350
  • [26] DeepTracker: Visualizing the Training Process of Convolutional Neural Networks
    Liu, Dongyu
    Cui, Weiwei
    Jin, Kai
    Guo, Yuxiao
    Qu, Huamin
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2019, 10 (01)
  • [27] CONVOLUTIONAL NEURAL NETWORKS AND TRAINING STRATEGIES FOR SKIN DETECTION
    Kim, Yoonsik
    Hwang, Insung
    Cho, Nam Ik
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3919 - 3923
  • [28] A framework for parallel and distributed training of neural networks
    Scardapane, Simone
    Di Lorenzo, Paolo
    NEURAL NETWORKS, 2017, 91 : 42 - 54
  • [29] Training Deep Convolutional Neural Networks to Play Go
    Clark, Christopher
    Storkey, Amos
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1766 - 1774
  • [30] Facial Action Units for Training Convolutional Neural Networks
    Trinh Thi Doan Pham
    Won, Chee Sun
    IEEE ACCESS, 2019, 7 : 77816 - 77824