High Performance Stencil Code Algorithms for GPGPUs

被引：35

作者：

Schaefer, Andreas ^{[1
]}

Fey, Dietmar ^{[1
]}

机构：

[1] Univ Erlangen Nurnberg, Chair Comp Sci Comp Architecture 3, D-91054 Erlangen, Germany

来源：

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS) | 2011年 / 4卷

关键词：

stencil codes; GPU; high performance computing; temporal blocking; Jacobi solver; CUDA;

D O I：

10.1016/j.procs.2011.04.221

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper we investigate how stencil computations can be implemented on state-of-the-art general purpose graphics processing units (GPGPUs). Stencil codes can be found at the core of many numerical solvers and physical simulation codes and are therefore of particular interest to scientific computing research. GPGPUs have gained a lot of attention recently because of their superior floating point performance and memory bandwidth. Nevertheless, especially memory bound stencil codes have proven to be challenging for GPGPUs, yielding lower than to be expected speedups. We chose the Jacobi method as a standard benchmark to evaluate a set of algorithms on NVIDIA's latest Fermi chipset. One of our fastest algorithms is a parallel wavefront update. It exploits the enlarged on-chip shared memory to perform two time step updates per sweep. To the best of our knowledge, it represents the first successful application of temporal blocking for 3D stencils on GPGPUs and thereby exceeds previous results by a considerable margin. It is also the first paper to study stencil codes on Fermi.

引用

页码：2027 / 2036

页数：10

共 50 条

[1] Performance limits study of stencil codes on modern GPGPUs
Pershin I.S.
Levchenko V.D.
Perepelkina A.Y.
Supercomputing Frontiers and Innovations, 2019, 6 (02) : 86 - 101
[2] High Performance Stencil Code Generation with LIFT
Hagedorn, Bastian
Stoltzfus, Larisa
Steuwer, Michel
Gorlatch, Sergei
Dubach, Christophe
PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'18), 2018, : 100 - 112
[3] Evaluation of Programming Models and Performance for Stencil Computation on GPGPUs
Shan, Baodi
Araya-Polo, Mauricio
2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 1178 - 1180
[4] Understanding Stencil Code Performance On MultiCore Architectures
Rahman, Shah M. Faizur
Yi, Qing
Qasem, Apan
PROCEEDINGS OF THE 2011 8TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF 2011), 2011,
[5] A new memory mapping mechanism for GPGPUs’ stencil computation
Tieqiang Mo
Renfa Li
Computing, 2015, 97 : 795 - 812
[6] A new memory mapping mechanism for GPGPUs' stencil computation
Mo, Tieqiang
Li, Renfa
COMPUTING, 2015, 97 (08) : 795 - 812
[7] High Performance Code Generation for Stencil Computation on Heterogeneous Multi-device Architectures
Li, Pei
Brunet, Elisabeth
Namyst, Raymond
2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1512 - 1518
[8] High Performance Parallel Graph Coloring on GPGPUs
Li, Pingfan
Chen, Xuhao
Quan, Zhe
Fang, Jianbin
Su, Huayou
Tang, Tao
Yang, Canqun
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 845 - 854
[9] Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations
Rawat, Prashant Singh
Vaidya, Miheer
Sukumaran-Rajam, Aravind
Ravishankar, Mahesh
Grover, Vinod
Rountev, Atanas
Pouchet, Louis-Noel
Sadayappan, P.
PROCEEDINGS OF THE IEEE, 2018, 106 (11) : 1902 - 1920
[10] Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs
Nasciutti, Thiago Carrijo
Panetta, Jairo
Lopes, Pedro Pais
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (18):

← 1 2 3 4 5 →