Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding

被引:0
|
作者
Yu, Qian [1 ]
Maddah-Ali, Mohammad Ali [2 ]
Avestimehr, A. Salman [1 ]
机构
[1] Univ Southern Calif, EE Dept, Los Angeles, CA 90089 USA
[2] Nokia Bell Labs, Murray Hill, NJ USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Consider massive matrix multiplication, a problem that underlies many data analytic applications, in a large-scale distributed system comprising a group of workers. We target the stragglers' delay performance bottleneck, which is due to the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. We propose a novel coding strategy, named entangled polynomial code, designing intermediate computations at the workers in order to minimize the recovery threshold (i.e., the number of workers that we need to wait for in order to compute the final output). We prove the optimality of entangled polynomial code in several cases, and show that it provides orderwise improvement over the conventional schemes for straggler mitigation. Furthermore, we characterize the optimal recovery threshold among all linear coding strategies within a factor of 2 using bilinear complexity, by developing an improved version of the entangled polynomial code.
引用
收藏
页码:2022 / 2026
页数:5
相关论文
共 50 条
  • [1] Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding
    Yu, Qian
    Maddah-Ali, Mohammad Ali
    Avestimehr, A. Salman
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2020, 66 (03) : 1920 - 1933
  • [2] Distributed Matrix Multiplication Based on Frame Quantization for Straggler Mitigation
    Son, Kyungrak
    Choi, Wan
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 3058 - 3073
  • [3] Distributed Matrix Multiplication Based on Frame Quantization for Straggler Mitigation
    Son, Kyungrak
    Choi, Wan
    [J]. IEEE Transactions on Signal Processing, 2022, 70 : 3058 - 3073
  • [4] Straggler Mitigation through Unequal Error Protection for Distributed Matrix Multiplication
    Tegin, Busra
    Hernandez, Eduin E.
    Rini, Stefano
    Duman, Tolga M.
    [J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
  • [5] Straggler Mitigation Through Unequal Error Protection for Distributed Approximate Matrix Multiplication
    Tegin, Busra
    Hernandez, Eduin E.
    Rini, Stefano
    Duman, Tolga M.
    [J]. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2022, 40 (02) : 468 - 483
  • [6] Stochastic gradient coding for straggler mitigation in distributed learning
    Bitar, Rawad
    Wootters, Mary
    El Rouayheb, Salim
    [J]. IEEE Journal on Selected Areas in Information Theory, 2020, 1 (01): : 277 - 291
  • [7] On the Capacity and Straggler-Robustness of Distributed Secure Matrix Multiplication
    Kakar, Jaber
    Ebadifar, Seyedhamed
    Sezgin, Aydin
    [J]. IEEE ACCESS, 2019, 7 : 45783 - 45799
  • [8] Stochastic Gradient Coding for Flexible Straggler Mitigation in Distributed Learning
    Bitar, Rawad
    Wootters, Mary
    El Rouayheb, Salim
    [J]. 2019 IEEE INFORMATION THEORY WORKSHOP (ITW), 2019, : 394 - 398
  • [9] Near-Optimal Straggler Mitigation for Distributed Gradient Methods
    Li, Songze
    Kalan, Seyed Mohammadreza Mousavi
    Avestimehr, A. Salman
    Soltanolkotabi, Mahdi
    [J]. 2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 857 - 866
  • [10] Securely Straggler-Exploiting Coded Computation for Distributed Matrix Multiplication
    Yang, Heecheol
    Hong, Sangwoo
    Lee, Jungwoo
    [J]. IEEE ACCESS, 2021, 9 : 167374 - 167388