A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning

被引：0

作者：

Jiang, Wenbin ^{[1
]}

Ye, Geyan ^{[1
]}

Yang, Laurence T. ^{[2
,3
]}

Zhu, Jian ^{[1
]}

Ma, Yang ^{[1
]}

Xie, Xia ^{[1
]}

Jin, Hai ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Serv Comp Technol & Syst Lab, Cluster & Grid Comp Lab,Sch Comp Sci & Technol, Wuhan 430074, Hubei, Peoples R China

[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Cyber Phys Social Syst Lab, Wuhan 430074, Hubei, Peoples R China

[3] St Francis Xavier Univ, Dept Comp Sci, Antigonish, NS, Canada

来源：

2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) | 2019年

基金：

中国国家自然科学基金;

关键词：

Deep Learning; Distributed SGD Algorithms; Parameter Servers; Heterogeneous Cluster Systems;

D O I：

10.1109/CCGRID.2019.000.53

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

On heterogeneous cluster systems, the convergence performances of neural network models are greatly troubled by the different performances of machines. In this paper, we propose a novel distributed Stochastic Gradient Descent (SGD) algorithm named Grouping-SGD for distributed deep learning, which converges faster than Sync-SGD, Async-SGD, and StaleSGD. In Grouping-SGD, machines are partitioned into multiple groups, ensuring that machines in the same group have similar performances. Machines in the same group update the models synchronously, while different groups update the models asynchronously. To improve the performance of Grouping-SGD further, the parameter servers are arranged from fast to slow, and they are responsible for updating the model parameters from the lower layer to the higher layer respectively. The experimental results indicate that Grouping-SGD can achieve 1.2 similar to 3.7 times speedups using popular image classification benchmarks: MNIST, Cifar10, Cifar1.00, and ImageNet, compared to SyncSGD, Async-SGD, and Stale-SGD.

引用

页码：391 / 398

页数：8

共 50 条

[1] An efficient, distributed stochastic gradient descent algorithm for deep-learning applications
Cong, Guojing
Bhardwaj, Onkar
Feng, Minwei
2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 11 - 20
[2] A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
Shi, Shaohuai
Wang, Qiang
Chu, Xiaowen
Li, Bo
2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 425 - 432
[3] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
Teng, Yunfei
Gao, Wenbo
Chalus, Francois
Choromanska, Anna
Goldfarb, Donald
Weller, Adrian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[4] A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces
Le Lan, Charline
Greaves, Joshua
Farebrother, Jesse
Rowland, Mark
Pedregosa, Fabian
Agarwal, Rishabh
Bellemare, Marc
arXiv, 2022,
[5] Deep learning for sea cucumber detection using stochastic gradient descent algorithm
Zhang, Huaqiang
Yu, Fusheng
Sun, Jincheng
Shen, Xiaoqin
Li, Kun
EUROPEAN JOURNAL OF REMOTE SENSING, 2020, 53 (53-62) : 53 - 62
[6] Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU plus GPU Architectures
Ma, Yujing
Rusu, Florin
Wu, Kesheng
Sim, Alexander
2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 6 - 15
[7] Hierarchical Heterogeneous Cluster Systems for Scalable Distributed Deep Learning
Wang, Yibo
Geng, Tongsheng
Silva, Ericson
Gaudiot, Jean-Luc
2024 IEEE 27TH INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING, ISORC 2024, 2024,
[8] GSSP: Eliminating Stragglers Through Grouping Synchronous for Distributed Deep Learning in Heterogeneous Cluster
Sun, Haifeng
Gui, Zhiyi
Guo, Song
Qi, Qi
Wang, Jingyu
Liao, Jianxin
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (04) : 2637 - 2648
[9] An asynchronous distributed training algorithm based on Gossip communication and Stochastic Gradient Descent
Tu, Jun
Zhou, Jia
Ren, Donglin
COMPUTER COMMUNICATIONS, 2022, 195 : 416 - 423
[10] Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air
Amiri, Mohammad Mohammadi
Gunduz, Deniz
2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2019, : 1432 - 1436

← 1 2 3 4 5 →