A hybrid GPU cluster and volunteer computing platform for scalable deep learning

被引：13

作者：

Kijsipongse, Ekasit ^{[1
]}

Piyatumrong, Apivadee ^{[1
]}

U-ruekolan, Suriya ^{[1
]}

机构：

[1] Natl Elect & Comp Technol Ctr NECTEC, Large Scale Simulat Res Lab, 112 Thailand Sci Pk,Pahon Yothin Rd,Klong 1, Klongluang 12120, Pathumthani, Thailand

来源：

JOURNAL OF SUPERCOMPUTING | 2018年 / 74卷 / 07期

关键词：

Cluster computing; Volunteer computing; Deep learning;

D O I：

10.1007/s11227-018-2375-9

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep learning is a very computing-intensive and time-consuming task. It needs an amount of computing resource much greater than a single machine can afford to train a sophisticated model within a reasonable time. Normally, GPU clusters are required to reduce the training time of a deep learning model from days to hours. However, building large dedicated GPU clusters is not always feasible or even ineffective for most organizations due to the cost of purchasing, operation and maintenance while such systems are not fully utilized all the time. In this regard, volunteer computing can address this problem as it provides additional computing resources at less or no cost. This work presents the hybrid cluster and volunteer computing platform that scales out GPU clusters into volunteer computing for distributed deep learning. The owners of the machines contribute unused computing resources on their computers to extend the capability of the GPU cluster. The challenge is to seamlessly align the differences between GPU cluster and volunteer computing systems so as to ensure the scalability transparency, whereas performance is also another major concern. We validate the proposed work with two well-known sample cases. The results show an efficient use of our hybrid platform at sub-linear speedup.

引用

页码：3236 / 3263

页数：28

共 50 条

[31] RECS: A Scalable Platform for Heterogeneous Computing
Mika, Kevin
Porrmann, Florian
Kucza, Nils
Griessl, Rene
Hagemeyer, Jens
2023 IEEE 36TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE, SOCC, 2023, : 84 - 89
[32] Scalable parallel and cluster computing abstract
Hwang, K
1997 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 1997, : 336 - 336
[33] Memory ushering in a scalable computing cluster
Barak, A
Braverman, A
MICROPROCESSORS AND MICROSYSTEMS, 1998, 22 (3-4) : 175 - 182
[34] Memory ushering in a scalable computing cluster
Barak, A
Braverman, A
ICA(3)PP 97 - 1997 3RD INTERNATIONAL CONFERENCE ON ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, 1997, : 211 - 224
[35] Performance models for scalable cluster computing
Wu, XF
Li, W
JOURNAL OF SYSTEMS ARCHITECTURE, 1997, 44 (3-4) : 189 - 205
[36] Fast and Highly Scalable Bayesian MDP on a GPU Platform
Zhou, He
Khatri, Sunil P.
Hu, Jiang
Liu, Frank
Sze, Cliff
ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 158 - 167
[37] Scalable Breadth-First Search on a GPU Cluster
Pan, Yuechao
Pearce, Roger
Owens, John D.
2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 1090 - 1101
[38] A SCALABLE HYBRID FPGA/GPU FX CORRELATOR
Kocz, J.
Greenhill, L. J.
Barsdell, B. R.
Bernardi, G.
Jameson, A.
Clark, M. A.
Craig, J.
Price, D.
Taylor, G. B.
Schinzel, F.
Werthimer, D.
JOURNAL OF ASTRONOMICAL INSTRUMENTATION, 2014, 3 (01)
[39] EasyTransfer: A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
Qiu, Minghui
Li, Peng
Wang, Chengyu
Pan, Haojie
Wang, Ang
Chen, Cen
Jia, Xianyan
Li, Yaliang
Huang, Jun
Cai, Deng
Lin, Wei
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4075 - 4084
[40] A MapReduce Computing Framework Based on GPU Cluster
Gao, Heng
Tang, Jie
Wu, Gangshan
2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1902 - 1907

← 1 2 3 4 5 →