A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning

被引：0

作者：

Jiang, Wenbin ^{[1
]}

Ye, Geyan ^{[1
]}

Yang, Laurence T. ^{[2
,3
]}

Zhu, Jian ^{[1
]}

Ma, Yang ^{[1
]}

Xie, Xia ^{[1
]}

Jin, Hai ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Serv Comp Technol & Syst Lab, Cluster & Grid Comp Lab,Sch Comp Sci & Technol, Wuhan 430074, Hubei, Peoples R China

[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Cyber Phys Social Syst Lab, Wuhan 430074, Hubei, Peoples R China

[3] St Francis Xavier Univ, Dept Comp Sci, Antigonish, NS, Canada

来源：

2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) | 2019年

基金：

中国国家自然科学基金;

关键词：

Deep Learning; Distributed SGD Algorithms; Parameter Servers; Heterogeneous Cluster Systems;

D O I：

10.1109/CCGRID.2019.000.53

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

On heterogeneous cluster systems, the convergence performances of neural network models are greatly troubled by the different performances of machines. In this paper, we propose a novel distributed Stochastic Gradient Descent (SGD) algorithm named Grouping-SGD for distributed deep learning, which converges faster than Sync-SGD, Async-SGD, and StaleSGD. In Grouping-SGD, machines are partitioned into multiple groups, ensuring that machines in the same group have similar performances. Machines in the same group update the models synchronously, while different groups update the models asynchronously. To improve the performance of Grouping-SGD further, the parameter servers are arranged from fast to slow, and they are responsible for updating the model parameters from the lower layer to the higher layer respectively. The experimental results indicate that Grouping-SGD can achieve 1.2 similar to 3.7 times speedups using popular image classification benchmarks: MNIST, Cifar10, Cifar1.00, and ImageNet, compared to SyncSGD, Async-SGD, and Stale-SGD.

引用

页码：391 / 398

页数：8

共 50 条

[21] AN IMPLEMENTATION OF A DISTRIBUTED STOCHASTIC GRADIENT DESCENT FOR RECOMMENDER SYSTEMS BASED ON MAP-REDUCE
Pozo, Manuel
Chiky, Raja
2015 INTERNATIONAL WORKSHOP ON COMPUTATIONAL INTELLIGENCE FOR MULTIMEDIA UNDERSTANDING (IWCIM), 2015,
[22] Distributed mirror descent algorithm over unbalanced digraphs based on gradient weighting technique
Shi, Chong-Xiao
Yang, Guang-Hong
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (14): : 10656 - 10680
[23] Byzantine Fault Tolerant Distributed Stochastic Gradient Descent Based on Over-the-Air Computation
Park, Sangjun
Choi, Wan
IEEE TRANSACTIONS ON COMMUNICATIONS, 2022, 70 (05) : 3204 - 3219
[24] A large-scale stochastic gradient descent algorithm over a graphon
Chen, Yan
Li, Tao
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 4806 - 4811
[25] Improvement of SPGD by Gradient Descent Optimization Algorithm in Deep Learning
Zhao, Qingsong
Hao, Shiqi
Wang, Yong
Wang, Lei
Lin, Zhi
2022 ASIA COMMUNICATIONS AND PHOTONICS CONFERENCE, ACP, 2022, : 469 - 472
[26] A Stochastic Gradient Descent Algorithm Based on Adaptive Differential Privacy
Deng, Yupeng
Li, Xiong
He, Jiabei
Liu, Yuzhen
Liang, Wei
COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT II, 2022, 461 : 133 - 152
[27] Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning: Examining Distributed and Centralized Stochastic Gradient Descent
Pu, Shi
Olshevsky, Alex
Paschalidis, Ioannis Ch.
IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (03) : 114 - 122
[28] Distributed electromagnetic target identification based on decentrallized stochastic gradient descent
Wang H.
Huang D.
Zhang W.
Pan Y.
Wang X.
Shao H.
Gu J.
Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2023, 45 (10): : 3024 - 3031
[29] A Hierarchical, bulk-synchronous stochastic gradient descent algorithm for deep-learning applications on GPU clusters
Cong, Guojing
Bhardwaj, Onkar
2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 818 - 821
[30] Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems
Xu, Lijie
Qiu, Shuang
Yuan, Binhang
Jiang, Jiawei
Renggli, Cedric
Gan, Shaoduo
Kara, Kaan
Li, Guoliang
Liu, Ji
Wu, Wentao
Ye, Jieping
Zhang, Ce
VLDB JOURNAL, 2024, 33 (05): : 1231 - 1255

← 1 2 3 4 5 →