Communication-Efficient Parallelization Strategy for Deep Convolutional Neural Network Training

被引:0
|
作者
Lee, Sunwoo [1 ]
Agrawal, Ankit [1 ]
Balaprakash, Prasanna [2 ]
Choudhary, Alok [1 ]
Liao, Wei-keng [1 ]
机构
[1] Northwestern Univ, EECS Dept, Evanston, IL 60208 USA
[2] Argonne Natl Lab, Lemont, IL USA
关键词
Convolutional Neural Network; Deep Learning; Parallelization; Distributed-Memory Parallelization;
D O I
10.1109/MLHPC.2018.000-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training Convolutional Neural Network (CNN) models is extremely time-consuming and the efficiency of its parallelization plays a key role in finishing the training in a reasonable amount of time. The well-known synchronous Stochastic Gradient Descent (SGD) algorithm suffers from high costs of inter-process communication and synchronization. To address such problems, asynchronous SGD algorithm employs a master-slave model for parameter update. However, it can result in a poor convergence rate due to the staleness of the gradient. In addition, the master-slave model is not scalable when running on a large number of compute nodes. In this paper, we present a communication-efficient gradient averaging algorithm for synchronous SGD, which adopts a few design strategies to maximize the degree of overlap between computation and communication. The time complexity analysis shows our algorithm outperforms the traditional allreduce-based algorithm. By training the two popular deep CNN models, VGG-16 and ResNet-50, on ImageNet dataset, our experiments performed on Cori Phase-I, a Cray XC40 supercomputer at NERSC show that our algorithm can achieve 2516.36 x speedup for VGG-16 and 2734.25x speedup for ResNet-50 using up to 8192 cores.
引用
收藏
页码:47 / 56
页数:10
相关论文
共 50 条
  • [41] Efficient training for the hybrid optical diffractive deep neural network
    Fang, Tao
    Lia, Jingwei
    Wu, Tongyu
    Cheng, Ming
    Dong, Xiaowen
    AI AND OPTICAL DATA SCIENCES III, 2022, 12019
  • [42] Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization
    Mostafa, Hesham
    Wang, Xin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [43] DGS: Communication-Efficient Graph Sampling for Distributed GNN Training
    Wan, Xinchen
    Chen, Kai
    Zhang, Yiming
    2022 IEEE 30TH INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS (ICNP 2022), 2022,
  • [44] Communication-Efficient Learning of Deep Networks from Decentralized Data
    McMahan, H. Brendan
    Moore, Eider
    Ramage, Daniel
    Hampson, Seth
    Aguera y Arcas, Blaise
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 1273 - 1282
  • [45] Communication-Efficient Federated DNN Training: Convert, Compress, Correct
    Chen, Zhong-Jing
    Hernandez, Eduin E.
    Huang, Yu-Chih
    Rini, Stefano
    IEEE Internet of Things Journal, 2024, 11 (24) : 40431 - 40447
  • [46] Communication-Efficient Distributed Deep Metric Learning with Hybrid Synchronization
    Su, Yuxin
    Lyu, Michael
    King, Irwin
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1463 - 1472
  • [47] CGX: Adaptive System Support for Communication-Efficient Deep Learning
    Markov, Ilia
    Ramezanikebrya, Hamidreza
    Alistarh, Dan
    PROCEEDINGS OF THE TWENTY-THIRD ACM/IFIP INTERNATIONAL MIDDLEWARE CONFERENCE, MIDDLEWARE 2022, 2022, : 241 - 254
  • [48] Automatic Delineation Strategy for Brain Metastases Using Deep Convolutional Neural Network
    Liu, Y.
    Stojadinovic, S.
    Hrycushko, B.
    Wardak, Z.
    Lu, W.
    Yan, Y.
    Jiang, S.
    Zhen, X.
    Timmerman, R.
    Abdulrahman, R.
    Nedzi, L.
    Gu, X.
    MEDICAL PHYSICS, 2017, 44 (06) : 3009 - 3010
  • [49] Deep Convolutional Neural Network Compression based on the Intrinsic Dimension of the Training Data
    Hadi, Abir Mohammad
    Won, Kwanghee
    APPLIED COMPUTING REVIEW, 2024, 24 (01): : 14 - 23
  • [50] FxpNet: Training a Deep Convolutional Neural Network in Fixed-Point Representation
    Chen, Xi
    Hu, Xiaolin
    Zhou, Hucheng
    Xu, Ningyi
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2494 - 2501