Towards a Scalable and Distributed Infrastructure for Deep Learning Applications

被引:4
|
作者
Hasheminezhad, Bita [1 ]
Shirzad, Shahrzad [1 ]
Wu, Nanmiao [1 ]
Diehl, Patrick [1 ]
Schulz, Hannes [2 ]
Kaiser, Hartmut [1 ]
机构
[1] Louisiana State Univ, Ctr Computat & Technol, Baton Rouge, LA 70803 USA
[2] Microsoft Res Montreal, Montreal, PQ, Canada
关键词
Distributed Deep Learning; High Performance Computing; HPX; Asynchronous Many-task System;
D O I
10.1109/DLS51937.2020.00008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although recent scaling up approaches to train deep neural networks have proven to be effective, the computational intensity of large and complex models, as well as the availability of large-scale datasets require deep learning frameworks to utilize scaling out techniques. Parallelization approaches and distribution requirements are not considered in the primary designs of most available distributed deep learning frameworks and most of them still are not able to perform effective and efficient fine-grained inter-node communication. We present Phylanx that has the potential to alleviate these shortcomings. Phylanx presents a productivity-oriented frontend where user Python code is translated to a futurized execution tree that can be executed efficiently on multiple nodes using the C++ standard library for parallelism and concurrency (HPX), leveraging fine-grained threading and an active messaging task-based runtime system.
引用
收藏
页码:20 / 30
页数:11
相关论文
共 50 条
  • [1] DDLBench: Towards a Scalable Benchmarking Infrastructure for Distributed Deep Learning
    Jansen, Matthijs
    Codreanu, Valeriu
    Varbanescu, Ana-Lucia
    PROCEEDINGS OF 2020 IEEE/ACM 5TH WORKSHOP ON DEEP LEARNING ON SUPERCOMPUTERS (DLS 2020), 2020, : 31 - 39
  • [2] ScaDL 2022: Fourth IPDPS Workshop on Scalable Deep Learning over Parallel and Distributed Infrastructure
    Ardagna, Danilo
    Patterson, Stacy
    Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022, 2022,
  • [3] Performance Analysis of Distributed and Scalable Deep Learning
    Mahon, Sean
    Varrette, Sebastien
    Plugaru, Valentin
    Pinel, Frederic
    Bouvry, Pascal
    2020 20TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2020), 2020, : 760 - 766
  • [4] A Scalable Distributed Architecture Towards Unifying IoT Applications
    Sarkar, Chayan
    Nambi, Akshay Uttama S. N.
    Prasad, R. Venkatesha
    Rahim, Abdur
    2014 IEEE WORLD FORUM ON INTERNET OF THINGS (WF-IOT), 2014, : 508 - 513
  • [5] A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning
    Akintoye, Samson B.
    Han, Liangxiu
    Zhang, Xin
    Chen, Haoming
    Zhang, Daoqiang
    IEEE ACCESS, 2022, 10 : 77950 - 77961
  • [6] Scalable deep learning for healthcare: methods and applications
    Barillaro, Luca
    Agapito, Giuseppe
    Cannataro, Mario
    13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022, 2022,
  • [7] Towards a common infrastructure for large-scale distributed applications
    Nikolaou, C
    Marazakis, M
    Papadakis, D
    Yeorgiannakis, Y
    Sairamesh, J
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 1997, 1324 : 173 - 193
  • [8] Deep Learning Towards Mobile Applications
    Wang, Ji
    Cao, Bokai
    Yu, Philip S.
    Sun, Lichao
    Bao, Weidong
    Zhu, Xiaomin
    2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 1385 - 1393
  • [9] Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques, and Tools
    Mayer, Ruben
    Jacobsen, Hans-Arno
    ACM COMPUTING SURVEYS, 2020, 53 (01)
  • [10] Hierarchical Heterogeneous Cluster Systems for Scalable Distributed Deep Learning
    Wang, Yibo
    Geng, Tongsheng
    Silva, Ericson
    Gaudiot, Jean-Luc
    2024 IEEE 27TH INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING, ISORC 2024, 2024,