Communication-Efficient Parallelization Strategy for Deep Convolutional Neural Network Training

被引:0
|
作者
Lee, Sunwoo [1 ]
Agrawal, Ankit [1 ]
Balaprakash, Prasanna [2 ]
Choudhary, Alok [1 ]
Liao, Wei-keng [1 ]
机构
[1] Northwestern Univ, EECS Dept, Evanston, IL 60208 USA
[2] Argonne Natl Lab, Lemont, IL USA
关键词
Convolutional Neural Network; Deep Learning; Parallelization; Distributed-Memory Parallelization;
D O I
10.1109/MLHPC.2018.000-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training Convolutional Neural Network (CNN) models is extremely time-consuming and the efficiency of its parallelization plays a key role in finishing the training in a reasonable amount of time. The well-known synchronous Stochastic Gradient Descent (SGD) algorithm suffers from high costs of inter-process communication and synchronization. To address such problems, asynchronous SGD algorithm employs a master-slave model for parameter update. However, it can result in a poor convergence rate due to the staleness of the gradient. In addition, the master-slave model is not scalable when running on a large number of compute nodes. In this paper, we present a communication-efficient gradient averaging algorithm for synchronous SGD, which adopts a few design strategies to maximize the degree of overlap between computation and communication. The time complexity analysis shows our algorithm outperforms the traditional allreduce-based algorithm. By training the two popular deep CNN models, VGG-16 and ResNet-50, on ImageNet dataset, our experiments performed on Cori Phase-I, a Cray XC40 supercomputer at NERSC show that our algorithm can achieve 2516.36 x speedup for VGG-16 and 2734.25x speedup for ResNet-50 using up to 8192 cores.
引用
收藏
页码:47 / 56
页数:10
相关论文
共 50 条
  • [31] HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs
    Hao Fu
    Shanjiang Tang
    Bingsheng He
    Ce Yu
    Jizhou Sun
    The Journal of Supercomputing, 2021, 77 : 12741 - 12770
  • [32] HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs
    Fu, Hao
    Tang, Shanjiang
    He, Bingsheng
    Yu, Ce
    Sun, Jizhou
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (11): : 12741 - 12770
  • [33] GCNTrain: A Unified and Efficient Accelerator for Graph Convolutional Neural Network Training
    Lu, Heng
    Song, Zhuoran
    Li, Xing
    Jing, Naifeng
    Liang, Xiaoyao
    2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 2022, : 730 - 737
  • [34] Author Correction: A deep convolutional neural network for efficient microglia detection
    Ilida Suleymanova
    Dmitrii Bychkov
    Jaakko Kopra
    Scientific Reports, 14
  • [35] General Bitwidth Assignment for Efficient Deep Convolutional Neural Network Quantization
    Fei, Wen
    Dai, Wenrui
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) : 5253 - 5267
  • [36] COMMUNICATION-EFFICIENT WEIGHTED ADMM FOR DECENTRALIZED NETWORK OPTIMIZATION
    Ling, Qing
    Liu, Yaohua
    Shi, Wei
    Tian, Zhi
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4821 - 4825
  • [37] More Efficient Training Strategy to Leverage Neurons in Neural Network
    Liou, Cheng-Fu
    Yu, Yi-Cheng
    2024 33RD INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, ISIE 2024, 2024,
  • [38] Communication-Efficient Privacy-Preserving Neural Network Inference via Arithmetic Secret Sharing
    Bi, Renwan
    Xiong, Jinbo
    Luo, Changqing
    Ning, Jianting
    Liu, Ximeng
    Tian, Youliang
    Zhang, Yan
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 6722 - 6737
  • [39] GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training
    Tyagi, Sahil
    Swany, Martin
    2023 IEEE 16TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, CLOUD, 2023, : 319 - 329
  • [40] Gist: Efficient Data Encoding for Deep Neural Network Training
    Jain, Animesh
    Phanishayee, Amar
    Mars, Jason
    Tang, Lingjia
    Pekhimenko, Gennady
    2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, : 776 - 789