Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning

被引：0

作者：

Yoon, Daegun ^{[1
]}

Oh, Sangyoon ^{[2
]}

机构：

[1] ETRI, Daejeon, South Korea

[2] Ajou Univ, Suwon, South Korea

来源：

2024 IEEE 24TH INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID 2024 | 2024年

关键词：

distributed deep learning; gradient sparsification; scalability;

D O I：

10.1109/CCGrid59990.2024.00043

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Communication overhead is a major obstacle to scaling distributed training systems. Gradient sparsification is a potential optimization approach to reduce the communication volume without significant loss of model fidelity. However, existing gradient sparsification methods have low scalability owing to inefficient design of their algorithms, which raises the communication overhead significantly. In particular, gradient build-up and inadequate sparsity control methods degrade the sparsification performance considerably. Moreover, communication traffic increases drastically owing to workload imbalance of gradient selection between workers. To address these challenges, we propose a novel gradient sparsification scheme called ExDyna. In ExDyna, the gradient tensor of the model comprises fined-grained blocks, and contiguous blocks are grouped into non-overlapping partitions. Each worker selects gradients in its exclusively allocated partition so that gradient build-up never occurs. To balance the workload of gradient selection between workers, ExDyna adjusts the topology of partitions by comparing the workloads of adjacent partitions. In addition, ExDyna supports online threshold scaling, which estimates the accurate threshold of gradient selection on-the-fly. Accordingly, ExDyna can satisfy the user-required sparsity level during a training period regardless of models and datasets. Therefore, ExDyna can enhance the scalability of distributed training systems by preserving near-optimal gradient sparsification cost. In experiments, ExDyna outperformed state-of-the-art sparsifiers in terms of training speed and sparsification performance while achieving high accuracy.

引用

页码：320 / 329

页数：10

共 50 条

[1] Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning
Yoon, Daegun
Oh, Sangyoon
arXiv,
[2] AutoDDL: Automatic Distributed Deep Learning With Near-Optimal Bandwidth Cost
Chen, Jinfan
Li, Shigang
Guo, Ran
Yuan, Jinhui
Hoefler, Torsten
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (08) : 1331 - 1344
[3] Near-Optimal Sparse Allreduce for Distributed Deep Learning
Li, Shigang
Hoefler, Torsten
PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 135 - 149
[4] Near-Optimal Straggler Mitigation for Distributed Gradient Methods
Li, Songze
Kalan, Seyed Mohammadreza Mousavi
Avestimehr, A. Salman
Soltanolkotabi, Mahdi
2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 857 - 866
[5] Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning
Yan, Zijie
Xiao, Danyang
Chen, Mengqiang
Zhou, Jieying
Wu, Weigang
PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
[6] OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning
Gao, Yunqi
Zhang, Zechao
Hu, Bing
Jin, A-Long
Wu, Chunming
PARALLEL COMPUTING, 2023, 118
[7] Distributed near-optimal matching
Deng, XT
COMBINATORICA, 1996, 16 (04) : 453 - 464
[8] Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs
Shi, Shaohuai
Wang, Qiang
Chu, Xiaowen
Li, Bo
Qin, Yang
Liu, Ruihao
Zhao, Xinxiao
IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2020, : 406 - 415
[9] Scalable distributed algorithms for multi-robot near-optimal motion planning
Zhao, Guoxiang
Zhu, Minghui
AUTOMATICA, 2022, 140
[10] Scalable distributed algorithms for multi-robot near-optimal motion planning
Zhao, Guoxiang
Zhu, Minghui
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 226 - 231

← 1 2 3 4 5 →