CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs

被引：0

作者：

DeHao Chen

WenGuang Chen

WeiMin Zheng

机构：

[1] Tsinghua University,Department of Computer Science and Technology

来源：

Science China Information Sciences | 2012年 / 55卷

关键词：

CUDA; parallelization; data access pattern; multi-GPU;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

As the prevalence of general purpose computations on GPU, shared memory programming models were proposed to ease the pain of GPU programming. However, with the demanding needs of more intensive workloads, it’s desirable to port GPU programs to more scalable distributed memory environment, such as multi-GPUs. To achieve this, programs need to be re-written with mixed programming models (e.g. CUDA and message passing). Programmers not only need to work carefully on workload distribution, but also on scheduling mechanisms to ensure the efficiency of the execution. In this paper, we studied the possibilities of automating the process of parallelization to multi-GPUs. Starting from a GPU program written in shared memory model, our framework analyzes the access patterns of arrays in kernel functions to derive the data partition schemes. To acquire the access pattern, we proposed a 3-tiers approach: static analysis, profile based analysis and user annotation. Experiments show that most access patterns can be derived correctly by the first two tiers, which means that zero efforts are needed to port an existing application to distributed memory environment. We use our framework to parallelize several applications, and show that for certain kinds of applications, CUDA-Zero can achieve efficient parallelization in multi-GPU environment.

引用

页码：663 / 676

页数：13

共 41 条

[1] CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs
Chen DeHao
Chen WenGuang
Zheng WeiMin
[J]. SCIENCE CHINA-INFORMATION SCIENCES, 2012, 55 (03) : 663 - 676
[2] CUDA-Zero:a framework for porting shared memory GPU applications to multi-GPUs
CHEN DeHao
[J]. Science China(Information Sciences), 2012, 55 (03) : 663 - 676
[3] Research on Multi-GPUs Image Processing Acceleration Based CUDA
Gao Song
Gao Biao
Xiao Qinkun
Wang Haiyun
[J]. 2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING (ICICEE), 2012, : 196 - 199
[4] Efficient Parallel Knuth-Morris-Pratt Algorithm for Multi-GPUs with CUDA
[J]. Lin, K.-J. (g548462@gmail.com), 2013, Springer Science and Business Media Deutschland GmbH (21):
[5] CUDA ClustalW: An efficient parallel algorithm for progressive multiple sequence alignment on Multi-GPUs
Hung, Che-Lun
Lin, Yu-Shiang
Lin, Chun-Yuan
Chung, Yeh-Ching
Chung, Yi-Fang
[J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2015, 58 : 62 - 68
[6] Data-Oriented Runtime Scheduling Framework on Multi-GPUs
Li, Tao
Zhao, Kezhao
Dong, Qiankun
Leng, Jiabing
Yang, Yulu
Ma, Wenjing
[J]. 2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1311 - 1318
[7] A Framework for Direct and Transparent Data Exchange of Filter-stream Applications in Multi-GPUs Architectures
Ramos, Gabriel
Andrade, Guilherme
Sachetto, Rafael
Madeira, Daniel
Carvalho, Renan
Ferreira, Renato
Mourao, Fernando
Rocha, Leonardo
[J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 1642 - 1651
[8] An improved mixed Lagrangian-Eulerian (IMLE) method for modelling incompressible Navier-Stokes flows with CUDA programming on multi-GPUs
Liu, Rex Kuan-Shuo
Wu, Cheng-Tao
Kao, Neo Shih-Chao
Sheu, Tony Wen-Hann
[J]. COMPUTERS & FLUIDS, 2019, 184 : 99 - 106
[9] WholeGraph: A Fast Graph Neural Network Training Framework with Multi-GPU Distributed Shared Memory Architecture
Yang, Dongxu
Liu, Junhong
Qi, Jiaxing
Lai, Junjie
[J]. SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
[10] Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters
Lai, Jianqi
Yu, Hang
Tian, Zhengyu
Li, Hua
[J]. SCIENTIFIC PROGRAMMING, 2020, 2020

← 1 2 3 4 5 →