Enabling Efficient Large-Scale Deep Learning Training with Cache Coherent Disaggregated Memory Systems

被引：4

作者：

Wang, Zixuan ^{[1
]}

Sim, Joonseop ^{[2
]}

Lim, Euicheol ^{[2
]}

Zhao, Jishen ^{[1
]}

机构：

[1] Univ Calif San Diego, San Diego, CA 92103 USA

[2] SK Hynix, Syst Architecture Div, Icheon Si, South Korea

来源：

2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022) | 2022年

关键词：

deep learning; training; cache coherence;

D O I：

10.1109/HPCA53966.2022.00018

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern deep learning (DL) training is memory-consuming, constrained by the memory capacity of each computation component and cross-device communication bandwidth. In response to such constraints, current approaches include increasing parallelism in distributed training and optimizing inter-device communication. However, model parameter communication is becoming a key performance bottleneck in distributed DL training. To improve parameter communication performance, we propose COARSE, a disaggregated memory extension for distributed DL training. COARSE is built on modern cache-coherent interconnect (CCI) protocols and MPI-like collective communication for synchronization, to allow low-latency and parallel access to training data and model parameters shared among worker GPUs. To enable high bandwidth transfers between GPUs and the disaggregated memory system, we propose a decentralized parameter communication scheme to decouple and localize parameter synchronization traffic. Furthermore, we propose dynamic tensor routing and partitioning to fully utilize the non-uniform serial bus bandwidth varied across different cloud computing systems. Finally, we design a deadlock avoidance and dual synchronization to ensure high-performance parameter synchronization. Our evaluation shows that COARSE achieves up to 48.3% faster DL training compared to the state-of-the-art MPI AllReduce communication.

引用

页码：126 / 140

页数：15

共 50 条

[31] Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
Li, Shenggui
Liu, Hongxin
Bian, Zhengda
Fang, Jiarui
Huang, Haichen
Liu, Yuliang
Wang, Boxiang
You, Yang
PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 766 - 775
[32] Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale
Thao-Nguyen Truong
Takano, Ryousei
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (08) : 1332 - 1339
[33] Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR
Long, Yanhua
Li, Yijie
Wei, Shuang
Zhang, Qiaozheng
Yang, Chunxia
IEEE ACCESS, 2019, 7 : 133615 - 133627
[34] Enabling Parallel Simulation of Large-Scale HPC Network Systems
Mubarak, Misbah
Carothers, Christopher D.
Ross, Robert B.
Carns, Philip
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (01) : 87 - 100
[35] Latch: Enabling large-scale automated testing on constrained systems
Lauwaerts T.
Marr S.
Scholliers C.
Science of Computer Programming, 2024, 238
[36] Designing Reconfigurable Large-Scale Deep Learning Systems Using Stochastic Computing
Ren, Ao
Li, Zhe
Wang, Yanzhi
Qiu, Qinru
Yuan, Bo
2016 IEEE INTERNATIONAL CONFERENCE ON REBOOTING COMPUTING (ICRC), 2016,
[37] Enabling large-scale screening of Barrett's esophagus using weakly supervised deep learning in histopathology
Bouzid, Kenza
Sharma, Harshita
Killcoyne, Sarah
Castro, Daniel C.
Schwaighofer, Anton
Ilse, Max
Salvatelli, Valentina
Oktay, Ozan
Murthy, Sumanth
Bordeaux, Lucas
Moore, Luiza
O'Donovan, Maria
Thieme, Anja
Nori, Aditya
Gehrung, Marcel
Alvarez-Valle, Javier
NATURE COMMUNICATIONS, 2024, 15 (01)
[38] Enabling large-scale screening of Barrett’s esophagus using weakly supervised deep learning in histopathology
Kenza Bouzid
Harshita Sharma
Sarah Killcoyne
Daniel C. Castro
Anton Schwaighofer
Max Ilse
Valentina Salvatelli
Ozan Oktay
Sumanth Murthy
Lucas Bordeaux
Luiza Moore
Maria O’Donovan
Anja Thieme
Aditya Nori
Marcel Gehrung
Javier Alvarez-Valle
Nature Communications, 15
[39] TIM: Enabling Large-Scale White-Box Testing on In-App Deep Learning Models
Wu, Hao
Gong, Yuhang
Ke, Xiaopeng
Liang, Hanzhong
Xu, Fengyuan
Liu, Yunxin
Zhong, Sheng
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 8188 - 8203
[40] Efficient Objective Functions for Coordinated Learning in Large-Scale Distributed OSA Systems
NoroozOliaee, MohammadJavad
Hamdaoui, Bechir
Tumer, Kagan
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2013, 12 (05) : 931 - 944

← 1 2 3 4 5 →