DDLBench: Towards a Scalable Benchmarking Infrastructure for Distributed Deep Learning

被引:6
|
作者
Jansen, Matthijs [1 ,2 ]
Codreanu, Valeriu [1 ]
Varbanescu, Ana-Lucia [2 ]
机构
[1] SURFsara, Amsterdam, Netherlands
[2] Univ Amsterdam, Amsterdam, Netherlands
关键词
D O I
10.1109/DLS51937.2020.00009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to its many applications across various fields of research, engineering, and daily life, deep learning has seen a surge in popularity. Therefore, larger and more expressive models have been proposed, with examples like Turing-NLG using as many as 17 billion parameters. Training these very large models becomes increasingly difficult due to the high computational costs and large memory footprint. Therefore, several approaches for distributed training based on data parallelism (e.g., Horovod) and model/pipeline parallelism (e.g., GPipe, PipeDream) have emerged. In this work, we focus on an in-depth comparison of three different parallelism models that address these needs: data, model and pipeline parallelism. To this end, we provide an analytical comparison of the three, both in terms of computation time and memory usage, and introduce DDLBench, a comprehensive (open-source1, ready-to-use) benchmark suite to quantify these differences in practice. Through in-depth performance analysis and experimentation with various models, datasets, distribution models and hardware systems, we demonstrate that DDLBench can accurately quantify the capability of a given system to perform distributed deep learning (DDL). By comparing our analytical models with the benchmarking results, we show how the performance of real-life implementations diverges from these analytical models, thus requiring benchmarking to capture the in-depth complexity of the frameworks themselves.
引用
收藏
页码:31 / 39
页数:9
相关论文
共 50 条
  • [31] Scalable Blockchain-empowered Distributed Computation Offloading: A Deep Reinforcement Learning Approach
    Xu, Feng
    Zhao, Zitong
    Liu, Lei
    Yuan, Xiaoming
    Pei, Qingqi
    IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS, INFOCOM WKSHPS 2024, 2024,
  • [32] Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning
    Yoon, Daegun
    Oh, Sangyoon
    2024 IEEE 24TH INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID 2024, 2024, : 320 - 329
  • [33] Towards scalable mobility in distributed hash tables
    Landsiedel, Olaf
    Goetz, Stefan
    Wehrle, Klaus
    SIXTH IEEE INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING, PROCEEDINGS, 2006, : 203 - +
  • [34] Towards a Scalable Distributed Fitness Evaluation Service
    Funika, Wlodzimierz
    Koperek, Pawel
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT I, 2016, 9573 : 493 - 502
  • [35] Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication
    Sattler, Felix
    Wiedemann, Simon
    Mueller, Klaus-Robert
    Samek, Wojciech
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [36] Towards a distributed infrastructure for evolving graph analytics
    Moffitt, Vera Zaychik
    Stoyanovich, Julia
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, : 843 - 848
  • [37] A flexible infrastructure for the support of distributed learning
    Caeiro, M
    Anido, L
    Llamas, M
    Alvárez, LM
    Mikic, FA
    COMPUTATIONAL SCIENCE - ICCS 2003, PT I, PROCEEDINGS, 2003, 2657 : 581 - 590
  • [38] Towards Scalable Within-Season Crop Mapping With Phenology Normalization and Deep Learning
    Yang, Zijun
    Diao, Chunyuan
    Gao, Feng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 1390 - 1402
  • [39] Team of Tiny ANNs: A Way Towards Cost-Efficient Scalable Deep Learning
    Younis, Hamad
    Hassan, Muhammad
    Younis, Shahzad
    Shafique, Muhammad
    PROCEEDINGS OF 2ND IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (ICAI 2022), 2022, : 52 - 57
  • [40] Towards Scalable Economic Photovoltaic Potential Analysis Using Aerial Images and Deep Learning
    Krapf, Sebastian
    Kemmerzell, Nils
    Khawaja Haseeb Uddin, Syed
    Hack Vazquez, Manuel
    Netzler, Fabian
    Lienkamp, Markus
    ENERGIES, 2021, 14 (13)