Towards a Scalable and Distributed Infrastructure for Deep Learning Applications

被引:4
|
作者
Hasheminezhad, Bita [1 ]
Shirzad, Shahrzad [1 ]
Wu, Nanmiao [1 ]
Diehl, Patrick [1 ]
Schulz, Hannes [2 ]
Kaiser, Hartmut [1 ]
机构
[1] Louisiana State Univ, Ctr Computat & Technol, Baton Rouge, LA 70803 USA
[2] Microsoft Res Montreal, Montreal, PQ, Canada
关键词
Distributed Deep Learning; High Performance Computing; HPX; Asynchronous Many-task System;
D O I
10.1109/DLS51937.2020.00008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although recent scaling up approaches to train deep neural networks have proven to be effective, the computational intensity of large and complex models, as well as the availability of large-scale datasets require deep learning frameworks to utilize scaling out techniques. Parallelization approaches and distribution requirements are not considered in the primary designs of most available distributed deep learning frameworks and most of them still are not able to perform effective and efficient fine-grained inter-node communication. We present Phylanx that has the potential to alleviate these shortcomings. Phylanx presents a productivity-oriented frontend where user Python code is translated to a futurized execution tree that can be executed efficiently on multiple nodes using the C++ standard library for parallelism and concurrency (HPX), leveraging fine-grained threading and an active messaging task-based runtime system.
引用
收藏
页码:20 / 30
页数:11
相关论文
共 50 条
  • [31] Fast and scalable all-optical network architecture for distributed deep learning
    Li, Wenzhe
    Yuan, Guojun
    Wang, Zhan
    Tan, Guangming
    Zhang, Peiheng
    Rouskas, George N.
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2024, 16 (03) : 342 - 357
  • [32] Towards an infrastructure for MLS distributed computing
    Kang, MH
    Froscher, JN
    Eppinger, BJ
    14TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, PROCEEDINGS, 1998, : 91 - 100
  • [33] EasyTransfer: A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
    Qiu, Minghui
    Li, Peng
    Wang, Chengyu
    Pan, Haojie
    Wang, Ang
    Chen, Cen
    Jia, Xianyan
    Li, Yaliang
    Huang, Jun
    Cai, Deng
    Lin, Wei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4075 - 4084
  • [34] Infinity: A Scalable Infrastructure for In-Network Applications
    Abranches, Marcelo
    Olson, Karl
    Keller, Eric
    2021 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2021), 2021, : 1050 - 1053
  • [35] Overview of Image Datasets for Deep Learning Applications in Diagnostics of Power Infrastructure
    Ruszczak, Bogdan
    Michalski, Pawel
    Tomaszewski, Michal
    SENSORS, 2023, 23 (16)
  • [36] The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications
    Agelastos, Anthony
    Allan, Benjamin
    Brandt, Jim
    Cassella, Paul
    Enos, Jeremy
    Fullop, Joshi
    Gentile, Ann
    Monk, Steve
    Naksinehaboon, Nichamon
    Ogden, Jeff
    Rajan, Mahesh
    Showerman, Michael
    Stevenson, Joel
    Taerat, Narate
    Tucker, Tom
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 154 - 165
  • [37] A scalable and robust framework for distributed applications
    Jelasity, M
    Preuss, M
    Paechter, B
    CEC'02: PROCEEDINGS OF THE 2002 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1 AND 2, 2002, : 1540 - 1545
  • [38] Software engineering for scalable distributed applications
    van Steen, M
    van der Zijden, S
    Sips, HJ
    TWENTY-SECOND ANNUAL INTERNATIONAL COMPUTER SOFTWARE & APPLICATIONS CONFERENCE - PROCEEDINGS, 1998, : 285 - 292
  • [39] TOWARDS DISTRIBUTED APPLICATIONS
    BELISLE, P
    VONBECHTOLSHEIM, M
    BOURDON, F
    FENG, Z
    KLACZKORYNDZIUN, S
    STEFFENS, J
    KOVACS, G
    LUKAS, K
    STAUDENMAIER, M
    IFIP TRANSACTIONS C-COMMUNICATION SYSTEMS, 1992, 1 : 395 - 396
  • [40] Deep Learning for Distributed Optimization: Applications to Wireless Resource Management
    Lee, Hoon
    Lee, Sang Hyun
    Quek, Tony Q. S.
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2019, 37 (10) : 2251 - 2266