SkePU: A Multi-Backend Skeleton Programming Library for Multi-GPU Systems

被引：0

作者：

Enmyren, Johan ^{[1
]}

Kessler, Christoph W. ^{[1
]}

机构：

[1] Linkoping Univ, Dept Comp & Informat Sci, PELAB, S-58183 Linkoping, Sweden

来源：

HLPP 2010: PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON HIGH-LEVEL PARALLEL PROGRAMMING AND APPLICATIONS | 2010年

关键词：

Skeleton Programming; GPU; CUDA; OpenCL; Data Parallelism;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present SkePU, a C++ template library which provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP backend. It also supports multi-GPU systems. Copying data between the host and the GPU device memory can be a performance bottleneck. A key technique in SkePU is the implementation of lazy memory copying in the container type used to represent skeleton operands, which allows to avoid unnecessary memory transfers. We evaluate SkePU with small benchmarks and a larger application, a Runge-Kutta ODE solver. The results show that a skeleton approach to GPU programming is viable, especially when the computation burden is large compared to memory I/O (the lazy memory copying can help to achieve this). It also shows that utilizing several GPUs have a potential for performance gains. We see that SkePU offers good performance with a more complex and realistic task such as ODE solving, with up to 10 times faster run times when using SkePU with a GPU backend compared to a sequential solver running on a fast CPU.

引用

页码：5 / 14

页数：10

共 50 条

[31] Efficient Solving of Scan Primitive on Multi-GPU Systems
Dieguez, Adrian P.
Amor, Margarita
Doallo, Ramon
Nukada, Akira
Matsuoka, Satoshi
[J]. 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 794 - 803
[32] Simulating cortical networks on heterogeneous multi-GPU systems
Nere, Andrew
Franey, Sean
Hashmi, Atif
Lipasti, Mikko
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (07) : 953 - 971
[33] Accelerated MR Physics Simulations on multi-GPU systems
Xanthis, Christos G.
Venetis, Ioannis E.
Aletras, Anthony H.
[J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2013,
[34] Performance Optimization of Allreduce Operation for Multi-GPU Systems
Nukada, Akira
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3107 - 3112
[35] Autonomous Execution for Multi-GPU Systems: Compiler Support
Koç University, Istanbul, Turkey
不详
CA, United States
[J]. Proc. SC -W: Workshops Int. Conf. High Perform. Comput., Netw., Storage Anal., (1129-1140):
[36] Efficient breadth first search on multi-GPU systems
Mastrostefano, Enrico
Bernaschi, Massimo
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (09) : 1292 - 1305
[37] Dynamic load balancing on heterogeneous multi-GPU systems
Acosta, Alejandro
Blanco, Vicente
Almeida, Francisco
[J]. COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (08) : 2591 - 2602
[38] MAPREDUCE IMPLEMENTATION WITH MULTI-GPU
Chen, Yi
Chen, Su
Jiang, Hai
[J]. INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE & TECHNOLOGY: PROCEEDINGS, 2012, : 21 - 25
[39] Tensor Movement Orchestration in Multi-GPU Training Systems
Lin, Shao-Fu
Chen, Yi-Jung
Cheng, Hsiang-Yun
Yang, Chia-Lin
[J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 1140 - 1152
[40] Gossip: Efficient Communication Primitives for Multi-GPU Systems
Kobus, Robin
Juenger, Daniel
Hundt, Christian
Schmidt, Bertil
[J]. PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,

← 1 2 3 4 5 →