A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning

被引:0
|
作者
Jiang, Wenbin [1 ]
Ye, Geyan [1 ]
Yang, Laurence T. [2 ,3 ]
Zhu, Jian [1 ]
Ma, Yang [1 ]
Xie, Xia [1 ]
Jin, Hai [1 ]
机构
[1] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Serv Comp Technol & Syst Lab, Cluster & Grid Comp Lab,Sch Comp Sci & Technol, Wuhan 430074, Hubei, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Cyber Phys Social Syst Lab, Wuhan 430074, Hubei, Peoples R China
[3] St Francis Xavier Univ, Dept Comp Sci, Antigonish, NS, Canada
基金
中国国家自然科学基金;
关键词
Deep Learning; Distributed SGD Algorithms; Parameter Servers; Heterogeneous Cluster Systems;
D O I
10.1109/CCGRID.2019.000.53
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
On heterogeneous cluster systems, the convergence performances of neural network models are greatly troubled by the different performances of machines. In this paper, we propose a novel distributed Stochastic Gradient Descent (SGD) algorithm named Grouping-SGD for distributed deep learning, which converges faster than Sync-SGD, Async-SGD, and StaleSGD. In Grouping-SGD, machines are partitioned into multiple groups, ensuring that machines in the same group have similar performances. Machines in the same group update the models synchronously, while different groups update the models asynchronously. To improve the performance of Grouping-SGD further, the parameter servers are arranged from fast to slow, and they are responsible for updating the model parameters from the lower layer to the higher layer respectively. The experimental results indicate that Grouping-SGD can achieve 1.2 similar to 3.7 times speedups using popular image classification benchmarks: MNIST, Cifar10, Cifar1.00, and ImageNet, compared to SyncSGD, Async-SGD, and Stale-SGD.
引用
收藏
页码:391 / 398
页数:8
相关论文
共 50 条
  • [21] AN IMPLEMENTATION OF A DISTRIBUTED STOCHASTIC GRADIENT DESCENT FOR RECOMMENDER SYSTEMS BASED ON MAP-REDUCE
    Pozo, Manuel
    Chiky, Raja
    2015 INTERNATIONAL WORKSHOP ON COMPUTATIONAL INTELLIGENCE FOR MULTIMEDIA UNDERSTANDING (IWCIM), 2015,
  • [22] Distributed mirror descent algorithm over unbalanced digraphs based on gradient weighting technique
    Shi, Chong-Xiao
    Yang, Guang-Hong
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (14): : 10656 - 10680
  • [23] Byzantine Fault Tolerant Distributed Stochastic Gradient Descent Based on Over-the-Air Computation
    Park, Sangjun
    Choi, Wan
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2022, 70 (05) : 3204 - 3219
  • [24] A large-scale stochastic gradient descent algorithm over a graphon
    Chen, Yan
    Li, Tao
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 4806 - 4811
  • [25] Improvement of SPGD by Gradient Descent Optimization Algorithm in Deep Learning
    Zhao, Qingsong
    Hao, Shiqi
    Wang, Yong
    Wang, Lei
    Lin, Zhi
    2022 ASIA COMMUNICATIONS AND PHOTONICS CONFERENCE, ACP, 2022, : 469 - 472
  • [26] A Stochastic Gradient Descent Algorithm Based on Adaptive Differential Privacy
    Deng, Yupeng
    Li, Xiong
    He, Jiabei
    Liu, Yuzhen
    Liang, Wei
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT II, 2022, 461 : 133 - 152
  • [27] Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning: Examining Distributed and Centralized Stochastic Gradient Descent
    Pu, Shi
    Olshevsky, Alex
    Paschalidis, Ioannis Ch.
    IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (03) : 114 - 122
  • [28] Distributed electromagnetic target identification based on decentrallized stochastic gradient descent
    Wang H.
    Huang D.
    Zhang W.
    Pan Y.
    Wang X.
    Shao H.
    Gu J.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2023, 45 (10): : 3024 - 3031
  • [29] A Hierarchical, bulk-synchronous stochastic gradient descent algorithm for deep-learning applications on GPU clusters
    Cong, Guojing
    Bhardwaj, Onkar
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 818 - 821
  • [30] Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems
    Xu, Lijie
    Qiu, Shuang
    Yuan, Binhang
    Jiang, Jiawei
    Renggli, Cedric
    Gan, Shaoduo
    Kara, Kaan
    Li, Guoliang
    Liu, Ji
    Wu, Wentao
    Ye, Jieping
    Zhang, Ce
    VLDB JOURNAL, 2024, 33 (05): : 1231 - 1255