MDLoader: A Hybrid Model-driven Data Loader for Distributed Deep Neural Networks Training

被引:0
|
作者
Bae, Jonghyun [1 ]
Choi, Jong Youl [2 ]
Pasini, Massimiliano Lupo [2 ]
Mehta, Kshitij [2 ]
Ibrahim, Khaled Z. [1 ]
机构
[1] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[2] Oak Ridge Natl Lab, Oak Ridge, TN USA
关键词
One-sided communication; Collective communication; Graph Neural Network; Performance estimator;
D O I
10.1109/IPDPSW63119.2024.00203
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this work, we propose MDLoader, a hybrid in-memory data loader for distributed deep neural networks. MDLoader introduces a model-driven performance estimator to automatically switch between one-sided and collective communication at runtime.
引用
收藏
页码:1193 / 1195
页数:3
相关论文
共 50 条
  • [1] Model-Driven Beamforming Neural Networks
    Xia, Wenchao
    Zheng, Gan
    Wong, Kai-Kit
    Zhu, Hongbo
    IEEE WIRELESS COMMUNICATIONS, 2020, 27 (01) : 68 - 75
  • [2] Benchmarking network fabrics for data distributed training of deep neural networks
    Samsi, Siddharth
    Prout, Andrew
    Jones, Michael
    Kirby, Andrew
    Arcand, Bill
    Bergeron, Bill
    Bestor, David
    Byun, Chansup
    Gadepally, Vijay
    Houle, Michael
    Hubbell, Matthew
    Klein, Anna
    Michaleas, Peter
    Milechin, Lauren
    Mullen, Julie
    Rosa, Antonio
    Yee, Charles
    Reuther, Albert
    Kepner, Jeremy
    2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
  • [3] Model-Driven Collection of Neural Modulation Data
    Cole, Eric R.
    Grogan, Dayton P.
    Eggers, Thomas E.
    Connolly, Mark J.
    Laxpati, Nealen G.
    Gross, Robert E.
    2021 10TH INTERNATIONAL IEEE/EMBS CONFERENCE ON NEURAL ENGINEERING (NER), 2021, : 281 - 284
  • [4] Model-driven estimation of distributed vulnerability in complex railway networks
    Drago, Annarita
    Marrone, Stefano
    Mazzocca, Nicola
    Tedesco, Annarita
    Vittorini, Valeria
    2013 IEEE 10TH INTERNATIONAL CONFERENCE ON AND 10TH INTERNATIONAL CONFERENCE ON AUTONOMIC AND TRUSTED COMPUTING (UIC/ATC) UBIQUITOUS INTELLIGENCE AND COMPUTING, 2013, : 380 - 387
  • [5] Model-driven distributed systems
    Coutts, IA
    Edwards, JM
    IEEE CONCURRENCY, 1997, 5 (03): : 55 - &
  • [6] Model-Aware Parallelization Strategy for Deep Neural Networks' Distributed Training
    Yang, Zhaoyi
    Dong, Fang
    2019 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2019, : 61 - 66
  • [7] Deep Energy: Task Driven Training of Deep Neural Networks
    Golts, Alona
    Freedman, Daniel
    Elad, Michael
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2021, 15 (02) : 324 - 338
  • [8] Accelerating Training for Distributed Deep Neural Networks in MapReduce
    Xu, Jie
    Wang, Jingyu
    Qi, Qi
    Sun, Haifeng
    Liao, Jianxin
    WEB SERVICES - ICWS 2018, 2018, 10966 : 181 - 195
  • [9] Model-Driven Deep Learning for Distributed Detection in WSNs with Binary Quantization
    Guo, Wei
    He, Meng
    Huang, Chuan
    He, Hengtao
    Song, Shenghui
    Zhang, Jun
    Letaief, Khaled B.
    2024 IEEE INTERNATIONAL MEDITERRANEAN CONFERENCE ON COMMUNICATIONS AND NETWORKING, MEDITCOM 2024, 2024, : 102 - 107
  • [10] Hybrid pre training algorithm of Deep Neural Networks
    Drokin, I. S.
    6TH SEMINAR ON INDUSTRIAL CONTROL SYSTEMS: ANALYSIS, MODELING AND COMPUTATION, 2016, 6