Communication-Efficient Parallelization Strategy for Deep Convolutional Neural Network Training

被引：0

作者：

Lee, Sunwoo ^{[1
]}

Agrawal, Ankit ^{[1
]}

Balaprakash, Prasanna ^{[2
]}

Choudhary, Alok ^{[1
]}

Liao, Wei-keng ^{[1
]}

机构：

[1] Northwestern Univ, EECS Dept, Evanston, IL 60208 USA

[2] Argonne Natl Lab, Lemont, IL USA

来源：

PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018) | 2018年

关键词：

Convolutional Neural Network; Deep Learning; Parallelization; Distributed-Memory Parallelization;

D O I：

10.1109/MLHPC.2018.000-4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Training Convolutional Neural Network (CNN) models is extremely time-consuming and the efficiency of its parallelization plays a key role in finishing the training in a reasonable amount of time. The well-known synchronous Stochastic Gradient Descent (SGD) algorithm suffers from high costs of inter-process communication and synchronization. To address such problems, asynchronous SGD algorithm employs a master-slave model for parameter update. However, it can result in a poor convergence rate due to the staleness of the gradient. In addition, the master-slave model is not scalable when running on a large number of compute nodes. In this paper, we present a communication-efficient gradient averaging algorithm for synchronous SGD, which adopts a few design strategies to maximize the degree of overlap between computation and communication. The time complexity analysis shows our algorithm outperforms the traditional allreduce-based algorithm. By training the two popular deep CNN models, VGG-16 and ResNet-50, on ImageNet dataset, our experiments performed on Cori Phase-I, a Cray XC40 supercomputer at NERSC show that our algorithm can achieve 2516.36 x speedup for VGG-16 and 2734.25x speedup for ResNet-50 using up to 8192 cores.

引用

页码：47 / 56

页数：10

共 50 条

[31] HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs
Hao Fu
Shanjiang Tang
Bingsheng He
Ce Yu
Jizhou Sun
The Journal of Supercomputing, 2021, 77 : 12741 - 12770
[32] HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs
Fu, Hao
Tang, Shanjiang
He, Bingsheng
Yu, Ce
Sun, Jizhou
JOURNAL OF SUPERCOMPUTING, 2021, 77 (11): : 12741 - 12770
[33] GCNTrain: A Unified and Efficient Accelerator for Graph Convolutional Neural Network Training
Lu, Heng
Song, Zhuoran
Li, Xing
Jing, Naifeng
Liang, Xiaoyao
2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 2022, : 730 - 737
[34] Author Correction: A deep convolutional neural network for efficient microglia detection
Ilida Suleymanova
Dmitrii Bychkov
Jaakko Kopra
Scientific Reports, 14
[35] General Bitwidth Assignment for Efficient Deep Convolutional Neural Network Quantization
Fei, Wen
Dai, Wenrui
Li, Chenglin
Zou, Junni
Xiong, Hongkai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) : 5253 - 5267
[36] COMMUNICATION-EFFICIENT WEIGHTED ADMM FOR DECENTRALIZED NETWORK OPTIMIZATION
Ling, Qing
Liu, Yaohua
Shi, Wei
Tian, Zhi
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4821 - 4825
[37] More Efficient Training Strategy to Leverage Neurons in Neural Network
Liou, Cheng-Fu
Yu, Yi-Cheng
2024 33RD INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, ISIE 2024, 2024,
[38] Communication-Efficient Privacy-Preserving Neural Network Inference via Arithmetic Secret Sharing
Bi, Renwan
Xiong, Jinbo
Luo, Changqing
Ning, Jianting
Liu, Ximeng
Tian, Youliang
Zhang, Yan
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 6722 - 6737
[39] GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training
Tyagi, Sahil
Swany, Martin
2023 IEEE 16TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, CLOUD, 2023, : 319 - 329
[40] Gist: Efficient Data Encoding for Deep Neural Network Training
Jain, Animesh
Phanishayee, Amar
Mars, Jason
Tang, Lingjia
Pekhimenko, Gennady
2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, : 776 - 789

← 1 2 3 4 5 →