A hybrid GPU cluster and volunteer computing platform for scalable deep learning

被引:13
|
作者
Kijsipongse, Ekasit [1 ]
Piyatumrong, Apivadee [1 ]
U-ruekolan, Suriya [1 ]
机构
[1] Natl Elect & Comp Technol Ctr NECTEC, Large Scale Simulat Res Lab, 112 Thailand Sci Pk,Pahon Yothin Rd,Klong 1, Klongluang 12120, Pathumthani, Thailand
来源
JOURNAL OF SUPERCOMPUTING | 2018年 / 74卷 / 07期
关键词
Cluster computing; Volunteer computing; Deep learning;
D O I
10.1007/s11227-018-2375-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning is a very computing-intensive and time-consuming task. It needs an amount of computing resource much greater than a single machine can afford to train a sophisticated model within a reasonable time. Normally, GPU clusters are required to reduce the training time of a deep learning model from days to hours. However, building large dedicated GPU clusters is not always feasible or even ineffective for most organizations due to the cost of purchasing, operation and maintenance while such systems are not fully utilized all the time. In this regard, volunteer computing can address this problem as it provides additional computing resources at less or no cost. This work presents the hybrid cluster and volunteer computing platform that scales out GPU clusters into volunteer computing for distributed deep learning. The owners of the machines contribute unused computing resources on their computers to extend the capability of the GPU cluster. The challenge is to seamlessly align the differences between GPU cluster and volunteer computing systems so as to ensure the scalability transparency, whereas performance is also another major concern. We validate the proposed work with two well-known sample cases. The results show an efficient use of our hybrid platform at sub-linear speedup.
引用
收藏
页码:3236 / 3263
页数:28
相关论文
共 50 条
  • [31] RECS: A Scalable Platform for Heterogeneous Computing
    Mika, Kevin
    Porrmann, Florian
    Kucza, Nils
    Griessl, Rene
    Hagemeyer, Jens
    2023 IEEE 36TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE, SOCC, 2023, : 84 - 89
  • [32] Scalable parallel and cluster computing abstract
    Hwang, K
    1997 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 1997, : 336 - 336
  • [33] Memory ushering in a scalable computing cluster
    Barak, A
    Braverman, A
    MICROPROCESSORS AND MICROSYSTEMS, 1998, 22 (3-4) : 175 - 182
  • [34] Memory ushering in a scalable computing cluster
    Barak, A
    Braverman, A
    ICA(3)PP 97 - 1997 3RD INTERNATIONAL CONFERENCE ON ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, 1997, : 211 - 224
  • [35] Performance models for scalable cluster computing
    Wu, XF
    Li, W
    JOURNAL OF SYSTEMS ARCHITECTURE, 1997, 44 (3-4) : 189 - 205
  • [36] Fast and Highly Scalable Bayesian MDP on a GPU Platform
    Zhou, He
    Khatri, Sunil P.
    Hu, Jiang
    Liu, Frank
    Sze, Cliff
    ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 158 - 167
  • [37] Scalable Breadth-First Search on a GPU Cluster
    Pan, Yuechao
    Pearce, Roger
    Owens, John D.
    2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 1090 - 1101
  • [38] A SCALABLE HYBRID FPGA/GPU FX CORRELATOR
    Kocz, J.
    Greenhill, L. J.
    Barsdell, B. R.
    Bernardi, G.
    Jameson, A.
    Clark, M. A.
    Craig, J.
    Price, D.
    Taylor, G. B.
    Schinzel, F.
    Werthimer, D.
    JOURNAL OF ASTRONOMICAL INSTRUMENTATION, 2014, 3 (01)
  • [39] EasyTransfer: A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
    Qiu, Minghui
    Li, Peng
    Wang, Chengyu
    Pan, Haojie
    Wang, Ang
    Chen, Cen
    Jia, Xianyan
    Li, Yaliang
    Huang, Jun
    Cai, Deng
    Lin, Wei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4075 - 4084
  • [40] A MapReduce Computing Framework Based on GPU Cluster
    Gao, Heng
    Tang, Jie
    Wu, Gangshan
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1902 - 1907