Preemptive Switch Memory Usage to Accelerate Training Jobs with Shared In-Network Aggregation

被引：4

作者：

Wang, Hao ^{[1
]}

Qin, Yuxuan ^{[1
]}

Lao, ChonLam ^{[2
]}

Le, Yanfang ^{[3
]}

Wu, Wenfei ^{[4
]}

Chen, Kai ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, iSING Lab, Hong Kong, Peoples R China

[2] Harvard Univ, Cambridge, MA USA

[3] Intel, Santa Clara, CA USA

[4] Peking Univ, Beijing, Peoples R China

来源：

2023 IEEE 31ST INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS, ICNP | 2023年

关键词：

D O I：

10.1109/ICNP59255.2023.10355574

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Recent works introduce In-Network Aggregation (INA) for distributed training (DT), which moves the gradient summation into network programmable switches. INA can reduce the traffic volume and accelerate communication in DT jobs. However, switch memory is a scarce resource, unable to support massive DT jobs in data centers, and existing INA solutions have not utilized switch memory to the best extent. We propose DSA, an Efficient Data-Plane switch memory Scheduler for in-network Aggregation. DSA introduces preemption to the switch memory management for INA jobs. In the data plane, DSA allows gradient tensors with high priority to preempt the switch aggregators (basic computation unit in INA) from tensors with low priority, which avoids an aggregator wasting time in idle. In the control plane, DSA devises a priority policy which assigns high priority to gradient tensors that benefit overall job efficiency more, e.g., communication intensive jobs. We prototype DSA and experiments show that DSA can improve the average JCT by up to 1.35x compared with baseline solutions.

引用

页数：12

共 14 条

[1] Accelerating Distributed Training With Collaborative In-Network Aggregation
Fang, Jin
Xu, Hongli
Zhao, Gongming
Yu, Zhuolong
Shen, Bingchen
Xie, Liguang
IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (04) : 3437 - 3452
[2] Concordia: Distributed Shared Memory with In-Network Cache Coherence
Wang, Qing
Lu, Youyou
Xu, Erci
Li, Junru
Chen, Youmin
Shu, Jiwu
PROCEEDINGS OF THE 19TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES (FAST '21), 2021, : 277 - 292
[3] GRID: Gradient Routing With In-Network Aggregation for Distributed Training
Fang, Jin
Zhao, Gongming
Xu, Hongli
Wu, Changbo
Yu, Zhuolong
IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (05) : 2267 - 2280
[4] Training Job Placement in Clusters with Statistical In-Network Aggregation
Zhao, Bohan
Xu, Wei
Liu, Shuo
Tian, Yang
Wang, Qiaoling
Wu, Wenfei
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ASPLOS 2024, VOL 1, 2024, : 420 - 434
[5] An In-Network Architecture for Accelerating Shared-Memory Multiprocessor Collectives
Klenk, Benjamin
Jiang, Nan
Thorson, Greg
Dennison, Larry
2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, : 996 - 1009
[6] Multi-Switch Cooperative In-Network Aggregation for Distributed Deep Learning
Su, Ming-Wei
Li, Yuan-Yu
Lin, Kate Ching-Ju
IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 4767 - 4772
[7] Maximizing Aggregation Throughput for Distributed Training with Constrained In-Network Computing
Luo, Long
Yang, Shulin
Wu, Hao
Yu, Hongfang
Lei, Bo
Gao, Shuai
ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 3652 - 3657
[8] PARING: Joint Task Placement and Routing for Distributed Training With In-Network Aggregation
Qiu, Yuhang
Zhao, Gongming
Xu, Hongli
Huang, He
Qiao, Chunming
IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (05) : 4317 - 4332
[9] InGo: In-Network Aggregation Routing with Batch Size Adjustment for Distributed Training
Bao, Jianfeng
Zhao, Gongming
Xu, Hongli
Wang, Haibo
Yang, Peng
2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS, 2024,
[10] GPU memory usage optimization for backward propagation in deep network training
Hong, Ding-Yong
Tsai, Tzu-Hsien
Wang, Ning
Liu, Pangfeng
Wu, Jan-Jan
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2025, 199

← 1 2 →