SINGA: A Distributed Deep Learning Platform

被引:78
|
作者
Ooi, Beng Chin [1 ]
Tan, Kian-Lee [1 ]
Wang, Sheng [1 ]
Wang, Wei [1 ]
Cai, Qingchao [1 ]
Chen, Gang [2 ]
Gao, Jinyang [1 ]
Luo, Zhaojing [1 ]
Tung, Anthony K. H. [1 ]
Wang, Yuan [3 ]
Xie, Zhongle [1 ]
Zhang, Meihui [4 ]
Zheng, Kaiping [1 ]
机构
[1] Natl Univ Singapore, Singapore 117548, Singapore
[2] Zhejiang Univ, Hangzhou Shi, Zhejiang Sheng, Peoples R China
[3] NetEase Inc, Hangzhou, Zhejiang, Peoples R China
[4] Singapore Univ Technol & Design, Singapore, Singapore
关键词
Deep learning; Distributed training;
D O I
10.1145/2733373.2807410
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning has shown outstanding performance in various machine learning tasks. However, the deep complex model structure and massive training data make it expensive to train. In this paper, we present a distributed deep learning system, called SINGA, for training big models over large datasets. An intuitive programming model based on the layer abstraction is provided, which supports a variety of popular deep learning models. SINGA architecture supports both synchronous and asynchronous training frameworks. Hybrid training frameworks can also be customized to achieve good scalability. SINGA provides different neural net partitioning schemes for training large models. SINGA is an Apache Incubator project released under Apache License 2.
引用
收藏
页码:685 / 688
页数:4
相关论文
共 50 条
  • [1] SINGA: Putting Deep Learning in the Hands of Multimedia Users
    Wang, Wei
    Chen, Gang
    Tien Tuan Anh Dinh
    Gao, Jinyang
    Ooi, Beng Chin
    Tan, Kian-Lee
    Wang, Sheng
    [J]. MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 25 - 34
  • [2] ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning
    Gu, Diandian
    Zhao, Yihao
    Zhong, Yinmin
    Xiong, Yifan
    Han, Zhenhua
    Cheng, Peng
    Yang, Fan
    Huang, Gang
    Jin, Xin
    Liu, Xuanzhe
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, VOL 2, ASPLOS 2023, 2023, : 266 - 280
  • [3] Distributed deep learning platform for pedestrian detection on IT convergence environment
    Seong-Soo Han
    Yoon-Ki Kim
    You-Boo Jeon
    JinSoo Park
    Doo-Soon Park
    DuHyun Hwang
    Chang-Sung Jeong
    [J]. The Journal of Supercomputing, 2020, 76 : 5460 - 5485
  • [4] MeLoN: Distributed Deep Learning meets the Big Data Platform
    Kang, Dae-Cheol
    Heo, Seoungbeom
    Jang, Hyeounji
    Lee, Hyeock-Jin
    Cho, Minkyoung
    Kim, Jik-Soo
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING AND SELF-ORGANIZING SYSTEMS COMPANION (ACSOS-C 2021), 2021, : 32 - 37
  • [5] Distributed deep learning platform for pedestrian detection on IT convergence environment
    Han, Seong-Soo
    Kim, Yoon-Ki
    Jeon, You-Boo
    Park, JinSoo
    Park, Doo-Soon
    Hwang, DuHyun
    Jeong, Chang-Sung
    [J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (07): : 5460 - 5485
  • [6] BAIPAS: Distributed Deep Learning Platform with Data Locality and Shuffling
    Lee, Mikyoung
    Shin, Sungho
    Hong, Seungkyun
    Song, Sa-kwang
    [J]. 2017 EUROPEAN CONFERENCE ON ELECTRICAL ENGINEERING AND COMPUTER SCIENCE (EECS), 2017, : 5 - 8
  • [7] Taking advantage of the Mobicents platform in the design of the SINGA environment
    Dlodlo, Nomusa
    Tolmay, J. P.
    [J]. WORLD CONFERENCE ON INFORMATION TECHNOLOGY (WCIT-2010), 2011, 3
  • [8] An Independent Study of Two Deep Learning Platforms-H2O and SINGA
    Ng, S. S. Y.
    Zhu, W.
    Tang, W. W. S.
    Wan, L. C. H.
    Wat, A. Y. W.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEM), 2016, : 1279 - 1283
  • [9] ShmCaffe: A Distributed Deep Learning Platform with Shared Memory Buffer for HPC Architecture
    Ahn, Shinyoung
    Kim, Joongheon
    Lim, Eunji
    Choi, Wan
    Mohaisen, Aziz
    Kang, Sungwon
    [J]. 2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 1118 - 1128
  • [10] A Platform for Deploying Multi-agent Deep Reinforcement Learning in Microgrid Distributed Control
    [J]. 2021 IEEE POWER & ENERGY SOCIETY GENERAL MEETING (PESGM), 2021,