A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning

被引:0
|
作者
Jiang, Wenbin [1 ]
Ye, Geyan [1 ]
Yang, Laurence T. [2 ,3 ]
Zhu, Jian [1 ]
Ma, Yang [1 ]
Xie, Xia [1 ]
Jin, Hai [1 ]
机构
[1] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Serv Comp Technol & Syst Lab, Cluster & Grid Comp Lab,Sch Comp Sci & Technol, Wuhan 430074, Hubei, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Cyber Phys Social Syst Lab, Wuhan 430074, Hubei, Peoples R China
[3] St Francis Xavier Univ, Dept Comp Sci, Antigonish, NS, Canada
基金
中国国家自然科学基金;
关键词
Deep Learning; Distributed SGD Algorithms; Parameter Servers; Heterogeneous Cluster Systems;
D O I
10.1109/CCGRID.2019.000.53
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
On heterogeneous cluster systems, the convergence performances of neural network models are greatly troubled by the different performances of machines. In this paper, we propose a novel distributed Stochastic Gradient Descent (SGD) algorithm named Grouping-SGD for distributed deep learning, which converges faster than Sync-SGD, Async-SGD, and StaleSGD. In Grouping-SGD, machines are partitioned into multiple groups, ensuring that machines in the same group have similar performances. Machines in the same group update the models synchronously, while different groups update the models asynchronously. To improve the performance of Grouping-SGD further, the parameter servers are arranged from fast to slow, and they are responsible for updating the model parameters from the lower layer to the higher layer respectively. The experimental results indicate that Grouping-SGD can achieve 1.2 similar to 3.7 times speedups using popular image classification benchmarks: MNIST, Cifar10, Cifar1.00, and ImageNet, compared to SyncSGD, Async-SGD, and Stale-SGD.
引用
收藏
页码:391 / 398
页数:8
相关论文
共 50 条
  • [1] An efficient, distributed stochastic gradient descent algorithm for deep-learning applications
    Cong, Guojing
    Bhardwaj, Onkar
    Feng, Minwei
    2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 11 - 20
  • [2] A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
    Shi, Shaohuai
    Wang, Qiang
    Chu, Xiaowen
    Li, Bo
    2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 425 - 432
  • [3] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
    Teng, Yunfei
    Gao, Wenbo
    Chalus, Francois
    Choromanska, Anna
    Goldfarb, Donald
    Weller, Adrian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces
    Le Lan, Charline
    Greaves, Joshua
    Farebrother, Jesse
    Rowland, Mark
    Pedregosa, Fabian
    Agarwal, Rishabh
    Bellemare, Marc
    arXiv, 2022,
  • [5] Deep learning for sea cucumber detection using stochastic gradient descent algorithm
    Zhang, Huaqiang
    Yu, Fusheng
    Sun, Jincheng
    Shen, Xiaoqin
    Li, Kun
    EUROPEAN JOURNAL OF REMOTE SENSING, 2020, 53 (53-62) : 53 - 62
  • [6] Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU plus GPU Architectures
    Ma, Yujing
    Rusu, Florin
    Wu, Kesheng
    Sim, Alexander
    2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 6 - 15
  • [7] Hierarchical Heterogeneous Cluster Systems for Scalable Distributed Deep Learning
    Wang, Yibo
    Geng, Tongsheng
    Silva, Ericson
    Gaudiot, Jean-Luc
    2024 IEEE 27TH INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING, ISORC 2024, 2024,
  • [8] GSSP: Eliminating Stragglers Through Grouping Synchronous for Distributed Deep Learning in Heterogeneous Cluster
    Sun, Haifeng
    Gui, Zhiyi
    Guo, Song
    Qi, Qi
    Wang, Jingyu
    Liao, Jianxin
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (04) : 2637 - 2648
  • [9] An asynchronous distributed training algorithm based on Gossip communication and Stochastic Gradient Descent
    Tu, Jun
    Zhou, Jia
    Ren, Donglin
    COMPUTER COMMUNICATIONS, 2022, 195 : 416 - 423
  • [10] Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air
    Amiri, Mohammad Mohammadi
    Gunduz, Deniz
    2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2019, : 1432 - 1436