MDLoader: A Hybrid Model-driven Data Loader for Distributed Deep Neural Networks Training

被引：0

作者：

Bae, Jonghyun ^{[1
]}

Choi, Jong Youl ^{[2
]}

Pasini, Massimiliano Lupo ^{[2
]}

Mehta, Kshitij ^{[2
]}

Ibrahim, Khaled Z. ^{[1
]}

机构：

[1] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA

[2] Oak Ridge Natl Lab, Oak Ridge, TN USA

来源：

2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024 | 2024年

关键词：

One-sided communication; Collective communication; Graph Neural Network; Performance estimator;

D O I：

10.1109/IPDPSW63119.2024.00203

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work, we propose MDLoader, a hybrid in-memory data loader for distributed deep neural networks. MDLoader introduces a model-driven performance estimator to automatically switch between one-sided and collective communication at runtime.

引用

页码：1193 / 1195

页数：3

共 50 条

[1] Model-Driven Beamforming Neural Networks
Xia, Wenchao
Zheng, Gan
Wong, Kai-Kit
Zhu, Hongbo
IEEE WIRELESS COMMUNICATIONS, 2020, 27 (01) : 68 - 75
[2] Benchmarking network fabrics for data distributed training of deep neural networks
Samsi, Siddharth
Prout, Andrew
Jones, Michael
Kirby, Andrew
Arcand, Bill
Bergeron, Bill
Bestor, David
Byun, Chansup
Gadepally, Vijay
Houle, Michael
Hubbell, Matthew
Klein, Anna
Michaleas, Peter
Milechin, Lauren
Mullen, Julie
Rosa, Antonio
Yee, Charles
Reuther, Albert
Kepner, Jeremy
2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
[3] Model-Driven Collection of Neural Modulation Data
Cole, Eric R.
Grogan, Dayton P.
Eggers, Thomas E.
Connolly, Mark J.
Laxpati, Nealen G.
Gross, Robert E.
2021 10TH INTERNATIONAL IEEE/EMBS CONFERENCE ON NEURAL ENGINEERING (NER), 2021, : 281 - 284
[4] Model-driven estimation of distributed vulnerability in complex railway networks
Drago, Annarita
Marrone, Stefano
Mazzocca, Nicola
Tedesco, Annarita
Vittorini, Valeria
2013 IEEE 10TH INTERNATIONAL CONFERENCE ON AND 10TH INTERNATIONAL CONFERENCE ON AUTONOMIC AND TRUSTED COMPUTING (UIC/ATC) UBIQUITOUS INTELLIGENCE AND COMPUTING, 2013, : 380 - 387
[5] Model-driven distributed systems
Coutts, IA
Edwards, JM
IEEE CONCURRENCY, 1997, 5 (03): : 55 - &
[6] Model-Aware Parallelization Strategy for Deep Neural Networks' Distributed Training
Yang, Zhaoyi
Dong, Fang
2019 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2019, : 61 - 66
[7] Deep Energy: Task Driven Training of Deep Neural Networks
Golts, Alona
Freedman, Daniel
Elad, Michael
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2021, 15 (02) : 324 - 338
[8] Accelerating Training for Distributed Deep Neural Networks in MapReduce
Xu, Jie
Wang, Jingyu
Qi, Qi
Sun, Haifeng
Liao, Jianxin
WEB SERVICES - ICWS 2018, 2018, 10966 : 181 - 195
[9] Model-Driven Deep Learning for Distributed Detection in WSNs with Binary Quantization
Guo, Wei
He, Meng
Huang, Chuan
He, Hengtao
Song, Shenghui
Zhang, Jun
Letaief, Khaled B.
2024 IEEE INTERNATIONAL MEDITERRANEAN CONFERENCE ON COMMUNICATIONS AND NETWORKING, MEDITCOM 2024, 2024, : 102 - 107
[10] Hybrid pre training algorithm of Deep Neural Networks
Drokin, I. S.
6TH SEMINAR ON INDUSTRIAL CONTROL SYSTEMS: ANALYSIS, MODELING AND COMPUTATION, 2016, 6

← 1 2 3 4 5 →