Towards a Scalable and Distributed Infrastructure for Deep Learning Applications

被引:4
|
作者
Hasheminezhad, Bita [1 ]
Shirzad, Shahrzad [1 ]
Wu, Nanmiao [1 ]
Diehl, Patrick [1 ]
Schulz, Hannes [2 ]
Kaiser, Hartmut [1 ]
机构
[1] Louisiana State Univ, Ctr Computat & Technol, Baton Rouge, LA 70803 USA
[2] Microsoft Res Montreal, Montreal, PQ, Canada
关键词
Distributed Deep Learning; High Performance Computing; HPX; Asynchronous Many-task System;
D O I
10.1109/DLS51937.2020.00008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although recent scaling up approaches to train deep neural networks have proven to be effective, the computational intensity of large and complex models, as well as the availability of large-scale datasets require deep learning frameworks to utilize scaling out techniques. Parallelization approaches and distribution requirements are not considered in the primary designs of most available distributed deep learning frameworks and most of them still are not able to perform effective and efficient fine-grained inter-node communication. We present Phylanx that has the potential to alleviate these shortcomings. Phylanx presents a productivity-oriented frontend where user Python code is translated to a futurized execution tree that can be executed efficiently on multiple nodes using the C++ standard library for parallelism and concurrency (HPX), leveraging fine-grained threading and an active messaging task-based runtime system.
引用
收藏
页码:20 / 30
页数:11
相关论文
共 50 条
  • [21] Deep Learning Cookbook: Recipes for your AI Infrastructure and Applications
    Serebryakov, Sergey
    Milojicic, Dejan
    Vassilieva, Natalia
    Fleischman, Stephen
    Clark, Robert D.
    PROCEEDINGS OF THE 2019 FOURTH IEEE INTERNATIONAL CONFERENCE ON REBOOTING COMPUTING (ICRC), 2019, : 16 - 24
  • [22] Towards a Scalable infrastructure for Ambient Assisted Living
    Wan, Jie
    O'Grady, Michael J.
    O'Hare, Gregory M. P.
    2013 IEEE THIRD INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - BERLIN (ICCE-BERLIN), 2013,
  • [23] TOWARDS SCALABLE DEEP LEARNING VIA I/O ANALYSIS AND OPTIMIZATION
    Pumma, Sarunya
    Si, Min
    Feng, Wu-chun
    Balaji, Pavan
    2017 19TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS (HPCC) / 2017 15TH IEEE INTERNATIONAL CONFERENCE ON SMART CITY (SMARTCITY) / 2017 3RD IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (DSS), 2017, : 223 - 230
  • [24] File Access Patterns of Distributed Deep Learning Applications
    Parraga, Edixon
    Leon, Betzabeth
    Mendez, Sandra
    Rexachs, Dolores
    Luque, Emilio
    CLOUD COMPUTING, BIG DATA & EMERGING TOPICS, JCC-BD&ET 2022, 2022, 1634 : 3 - 19
  • [25] Performance and Consistency Analysis for Distributed Deep Learning Applications
    Jia, Danlin
    Saha, Manoj Pravakar
    Bhimani, Janki
    Mi, Ningfang
    2020 IEEE 39TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2020,
  • [26] Workshop 19: ScaDL scalable deep learning over parallel and distributed infrastructures
    Verma, Ashish
    Carothers, Christopher
    Jayaram, K.R.
    Dube, Parijat
    Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020, 2020,
  • [27] Scalable Computation Offloading for Industrial IoTs via Distributed Deep Reinforcement Learning
    Dai, Bin
    Qiu, Yuan
    Feng, Weikun
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1681 - 1686
  • [28] Towards Faster Distributed Deep Learning Data Hashing Techniques
    Provatas, Nikodimos
    Konstantinou, Ioannis
    Koziris, Nectarios
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6189 - 6191
  • [29] Distributed Scalable Edge Computing Infrastructure for Open Metaverse
    Zhou, Larry
    Lambert, Jordan
    Zheng, Yanyan
    Li, Zheng
    Yen, Alan
    Liu, Sandra
    Ye, Vivian
    Zhou, Maggie
    Mahar, David
    Gibbons, John
    Satterlee, Michael
    2023 IEEE CLOUD SUMMIT, 2023, : 1 - 6
  • [30] Towards accelerating model parallelism in distributed deep learning systems
    Choi, Hyeonseong
    Lee, Byung Hyun
    Chun, Se Young
    Lee, Jaehwan
    PLOS ONE, 2023, 18 (11):