High Performance Stencil Code Algorithms for GPGPUs

被引：35

作者：

Schaefer, Andreas ^{[1
]}

Fey, Dietmar ^{[1
]}

机构：

[1] Univ Erlangen Nurnberg, Chair Comp Sci Comp Architecture 3, D-91054 Erlangen, Germany

来源：

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS) | 2011年 / 4卷

关键词：

stencil codes; GPU; high performance computing; temporal blocking; Jacobi solver; CUDA;

D O I：

10.1016/j.procs.2011.04.221

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper we investigate how stencil computations can be implemented on state-of-the-art general purpose graphics processing units (GPGPUs). Stencil codes can be found at the core of many numerical solvers and physical simulation codes and are therefore of particular interest to scientific computing research. GPGPUs have gained a lot of attention recently because of their superior floating point performance and memory bandwidth. Nevertheless, especially memory bound stencil codes have proven to be challenging for GPGPUs, yielding lower than to be expected speedups. We chose the Jacobi method as a standard benchmark to evaluate a set of algorithms on NVIDIA's latest Fermi chipset. One of our fastest algorithms is a parallel wavefront update. It exploits the enlarged on-chip shared memory to perform two time step updates per sweep. To the best of our knowledge, it represents the first successful application of temporal blocking for 3D stencils on GPGPUs and thereby exceeds previous results by a considerable margin. It is also the first paper to study stencil codes on Fermi.

引用

页码：2027 / 2036

页数：10

共 50 条

[31] Thoroughly Exploring GPU Buffering Options for Stencil Code by Using an Efficiency Measure and a Performance Model
Hu, Yue
Koppelman, David M.
Brandt, Steven Robert
IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, 2018, 4 (03): : 477 - 490
[32] HppCnn: A High-Performance, Portable Deep-Learning Library for GPGPUs
Yang, Yi
Feng, Min
Chakradhar, Srimat
PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016, 2016, : 582 - 587
[33] Application Characteristics-Aware Sporadic Cache Bypassing for high performance GPGPUs
Do, Cong Thuan
Kim, Jong Myon
Kim, Cheol Hong
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 122 : 238 - 250
[34] A Performance Study of an Anelastic Wave Propagation Code Using Auto-tuned Stencil Computations
Christen, Matthias
Schenk, Olaf
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012, 2012, 9 : 956 - 965
[35] Investigating Performance Losses in High-Level Synthesis for Stencil Computations
Altoyan, Wesson
Alonso, Juan J.
28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2020, : 195 - 203
[36] Special issue: Advanced stencil-code engineering
Lengauer, Christian
Bolten, Matthias
Falgout, Robert
Schenk, Olaf
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (17):
[37] Locality-aware scheduling for stencil code in Halide
Liao, Shih-wei
Tsai, Sheng-Jun
Yang, Chieh-Hsun
Lo, Chen-Kang
PROCEEDINGS OF 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW 2016), 2016, : 72 - 77
[38] Virtualizing high-end GPGPUs on ARM clusters for the next generation of high performance cloud computing
Raffaele Montella
Giulio Giunta
Giuliano Laccetti
Cluster Computing, 2014, 17 : 139 - 152
[39] Virtualizing high-end GPGPUs on ARM clusters for the next generation of high performance cloud computing
Montella, Raffaele
Giunta, Giulio
Laccetti, Giuliano
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2014, 17 (01): : 139 - 152
[40] High-Performance High-Order Stencil Computation on FPGAs Using OpenCL
Zohouri, Hamid Reza
Podobas, Artur
Matsuoka, Satoshi
2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 123 - 130

← 1 2 3 4 5 →