Predicting Throughput of Distributed Stochastic Gradient Descent

被引:0
|
作者
Li, Zhuojin [1 ]
Paolieri, Marco [1 ]
Golubchik, Leana [1 ]
Lin, Sung-Han [2 ]
Yan, Wumo [1 ]
机构
[1] Univ Southern Calif, Dept Comp Sci, Los Angeles, CA 90089 USA
[2] Meta, Menlo Pk, CA 94025 USA
关键词
Computational modeling; Predictive models; Training; Throughput; Servers; Computer architecture; Uplink; Distributed machine learning; stochastic gradient descent; performance prediction; scalability; PyTorch;
D O I
10.1109/TPDS.2022.3151739
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Training jobs of deep neural networks (DNNs) can be accelerated through distributed variants of stochastic gradient descent (SGD), where multiple nodes process training examples and exchange updates. The total throughput of the nodes depends not only on their computing power, but also on their networking speeds and coordination mechanism (synchronous or asynchronous, centralized or decentralized), since communication bottlenecks and stragglers can result in sublinear scaling when additional nodes are provisioned. In this paper, we propose two classes of performance models to predict throughput of distributed SGD: fine-grained models, representing many elementary computation/communication operations and their dependencies; and coarse-grained models, where SGD steps at each node are represented as a sequence of high-level phases without parallelism between computation and communication. Using a PyTorch implementation, real-world DNN models and different cloud environments, our experimental evaluation illustrates that, while fine-grained models are more accurate and can be easily adapted to new variants of distributed SGD, coarse-grained models can provide similarly accurate predictions when augmented with ad hoc heuristics, and their parameters can be estimated with profiling information that is easier to collect.
引用
收藏
页码:2900 / 2912
页数:13
相关论文
共 50 条
  • [1] Bayesian Distributed Stochastic Gradient Descent
    Teng, Michael
    Wood, Frank
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [2] Distributed stochastic gradient descent with discriminative aggregating
    Chen, Zhen-Hong
    Lan, Yan-Yan
    Guo, Jia-Feng
    Cheng, Xue-Qi
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2015, 38 (10): : 2054 - 2063
  • [3] Distributed and asynchronous Stochastic Gradient Descent with variance reduction
    Ming, Yuewei
    Zhao, Yawei
    Wu, Chengkun
    Li, Kuan
    Yin, Jianping
    [J]. NEUROCOMPUTING, 2018, 281 : 27 - 36
  • [4] Convergence analysis of distributed stochastic gradient descent with shuffling
    Meng, Qi
    Chen, Wei
    Wang, Yue
    Ma, Zhi-Ming
    Liu, Tie-Yan
    [J]. NEUROCOMPUTING, 2019, 337 : 46 - 57
  • [5] Distributed Stochastic Gradient Descent Using LDGM Codes
    Horii, Shunsuke
    Yoshida, Takahiro
    Kobayashi, Manabu
    Matsushima, Toshiyasu
    [J]. 2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2019, : 1417 - 1421
  • [6] Distributed Stochastic Gradient Descent With Compressed and Skipped Communication
    Phuong, Tran Thi
    Phong, Le Trieu
    Fukushima, Kazuhide
    [J]. IEEE ACCESS, 2023, 11 : 99836 - 99846
  • [7] Communication-Censored Distributed Stochastic Gradient Descent
    Li, Weiyu
    Wu, Zhaoxian
    Chen, Tianyi
    Li, Liping
    Ling, Qing
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6831 - 6843
  • [8] A Distributed Optimal Control Problem with Averaged Stochastic Gradient Descent
    Sun, Qi
    Du, Qiang
    [J]. COMMUNICATIONS IN COMPUTATIONAL PHYSICS, 2020, 27 (03) : 753 - 774
  • [9] A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent
    Pu, Shi
    Olshevsky, Alex
    Paschalidis, Ioannis Ch
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (11) : 5900 - 5915
  • [10] ON DISTRIBUTED STOCHASTIC GRADIENT DESCENT FOR NONCONVEX FUNCTIONS IN THE PRESENCE OF BYZANTINES
    Bulusu, Saikiran
    Khanduri, Prashant
    Sharma, Pranay
    Varshney, Pramod K.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3137 - 3141