Gradient Coding: Avoiding Stragglers in Distributed Learning

被引:0
|
作者
Tandon, Rashish [1 ]
Lei, Qi [2 ]
Dimakis, Alexandros G. [3 ]
Karampatziakis, Nikos [4 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[2] Univ Texas Austin, Inst Computat Engn & Sci, Austin, TX 78712 USA
[3] Univ Texas Austin, Dept Elect & Comp Engn, Austin, TX 78712 USA
[4] Microsoft, Seattle, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel coding theoretic framework for mitigating stragglers in distributed learning. We show how carefully replicating data blocks and coding across gradients can provide tolerance to failures and stragglers for synchronous Gradient Descent. We implement our schemes in python (using MPI) to run on Amazon EC2, and show how we compare against baseline approaches in running time and generalization error.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Distributed Learning Based on 1-Bit Gradient Coding in the Presence of Stragglers
    Li, Chengxi
    Skoglund, Mikael
    [J]. IEEE TRANSACTIONS ON COMMUNICATIONS, 2024, 72 (08) : 4903 - 4916
  • [2] Optimization-Based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning
    Wang, Qi
    Cui, Ying
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2023, 71 : 1023 - 1038
  • [3] Live Gradient Compensation for Evading Stragglers in Distributed Learning
    Xu, Jian
    Huang, Shao-Lun
    Song, Linqi
    Lan, Tian
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,
  • [4] On Arbitrary Ignorance of Stragglers with Gradient Coding
    Su, Xian
    Sukhnandan, Brian
    Li, Jun
    [J]. 2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 660 - 670
  • [5] Stochastic gradient coding for straggler mitigation in distributed learning
    Bitar, Rawad
    Wootters, Mary
    El Rouayheb, Salim
    [J]. IEEE Journal on Selected Areas in Information Theory, 2020, 1 (01): : 277 - 291
  • [6] Heterogeneity-Aware Gradient Coding for Tolerating and Leveraging Stragglers
    Wang, Haozhao
    Guo, Song
    Tang, Bin
    Li, Ruixuan
    Yang, Yutong
    Qu, Zhihao
    Wang, Yi
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (04) : 779 - 794
  • [7] Gradient Coding Based on Block Designs for Mitigating Adversarial Stragglers
    Kadhe, Swanand
    Koyluoglu, O. Ozan
    Ramchandran, Kannan
    [J]. 2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2019, : 2813 - 2817
  • [8] Stochastic Gradient Coding for Flexible Straggler Mitigation in Distributed Learning
    Bitar, Rawad
    Wootters, Mary
    El Rouayheb, Salim
    [J]. 2019 IEEE INFORMATION THEORY WORKSHOP (ITW), 2019, : 394 - 398
  • [9] Balancing Stragglers Against Staleness in Distributed Deep Learning
    Basu, Saurav
    Saxena, Vaibhav
    Panja, Rintu
    Verma, Ashish
    [J]. 2018 IEEE 25TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2018, : 12 - 21
  • [10] Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers
    Hanna, Serge Kas
    Bitar, Rawad
    Parag, Parimal
    Dasari, Venkat
    El Rouayheb, Salim
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4262 - 4266