SkePU: A Multi-Backend Skeleton Programming Library for Multi-GPU Systems

被引:0
|
作者
Enmyren, Johan [1 ]
Kessler, Christoph W. [1 ]
机构
[1] Linkoping Univ, Dept Comp & Informat Sci, PELAB, S-58183 Linkoping, Sweden
关键词
Skeleton Programming; GPU; CUDA; OpenCL; Data Parallelism;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present SkePU, a C++ template library which provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP backend. It also supports multi-GPU systems. Copying data between the host and the GPU device memory can be a performance bottleneck. A key technique in SkePU is the implementation of lazy memory copying in the container type used to represent skeleton operands, which allows to avoid unnecessary memory transfers. We evaluate SkePU with small benchmarks and a larger application, a Runge-Kutta ODE solver. The results show that a skeleton approach to GPU programming is viable, especially when the computation burden is large compared to memory I/O (the lazy memory copying can help to achieve this). It also shows that utilizing several GPUs have a potential for performance gains. We see that SkePU offers good performance with a more complex and realistic task such as ODE solving, with up to 10 times faster run times when using SkePU with a GPU backend compared to a sequential solver running on a fast CPU.
引用
收藏
页码:5 / 14
页数:10
相关论文
共 50 条
  • [31] Efficient Solving of Scan Primitive on Multi-GPU Systems
    Dieguez, Adrian P.
    Amor, Margarita
    Doallo, Ramon
    Nukada, Akira
    Matsuoka, Satoshi
    [J]. 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 794 - 803
  • [32] Simulating cortical networks on heterogeneous multi-GPU systems
    Nere, Andrew
    Franey, Sean
    Hashmi, Atif
    Lipasti, Mikko
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (07) : 953 - 971
  • [33] Accelerated MR Physics Simulations on multi-GPU systems
    Xanthis, Christos G.
    Venetis, Ioannis E.
    Aletras, Anthony H.
    [J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2013,
  • [34] Performance Optimization of Allreduce Operation for Multi-GPU Systems
    Nukada, Akira
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3107 - 3112
  • [35] Autonomous Execution for Multi-GPU Systems: Compiler Support
    Koç University, Istanbul, Turkey
    不详
    CA, United States
    [J]. Proc. SC -W: Workshops Int. Conf. High Perform. Comput., Netw., Storage Anal., (1129-1140):
  • [36] Efficient breadth first search on multi-GPU systems
    Mastrostefano, Enrico
    Bernaschi, Massimo
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (09) : 1292 - 1305
  • [37] Dynamic load balancing on heterogeneous multi-GPU systems
    Acosta, Alejandro
    Blanco, Vicente
    Almeida, Francisco
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (08) : 2591 - 2602
  • [38] MAPREDUCE IMPLEMENTATION WITH MULTI-GPU
    Chen, Yi
    Chen, Su
    Jiang, Hai
    [J]. INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE & TECHNOLOGY: PROCEEDINGS, 2012, : 21 - 25
  • [39] Tensor Movement Orchestration in Multi-GPU Training Systems
    Lin, Shao-Fu
    Chen, Yi-Jung
    Cheng, Hsiang-Yun
    Yang, Chia-Lin
    [J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 1140 - 1152
  • [40] Gossip: Efficient Communication Primitives for Multi-GPU Systems
    Kobus, Robin
    Juenger, Daniel
    Hundt, Christian
    Schmidt, Bertil
    [J]. PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,