High Performance Stencil Code Algorithms for GPGPUs

被引:35
|
作者
Schaefer, Andreas [1 ]
Fey, Dietmar [1 ]
机构
[1] Univ Erlangen Nurnberg, Chair Comp Sci Comp Architecture 3, D-91054 Erlangen, Germany
关键词
stencil codes; GPU; high performance computing; temporal blocking; Jacobi solver; CUDA;
D O I
10.1016/j.procs.2011.04.221
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper we investigate how stencil computations can be implemented on state-of-the-art general purpose graphics processing units (GPGPUs). Stencil codes can be found at the core of many numerical solvers and physical simulation codes and are therefore of particular interest to scientific computing research. GPGPUs have gained a lot of attention recently because of their superior floating point performance and memory bandwidth. Nevertheless, especially memory bound stencil codes have proven to be challenging for GPGPUs, yielding lower than to be expected speedups. We chose the Jacobi method as a standard benchmark to evaluate a set of algorithms on NVIDIA's latest Fermi chipset. One of our fastest algorithms is a parallel wavefront update. It exploits the enlarged on-chip shared memory to perform two time step updates per sweep. To the best of our knowledge, it represents the first successful application of temporal blocking for 3D stencils on GPGPUs and thereby exceeds previous results by a considerable margin. It is also the first paper to study stencil codes on Fermi.
引用
收藏
页码:2027 / 2036
页数:10
相关论文
共 50 条
  • [31] Thoroughly Exploring GPU Buffering Options for Stencil Code by Using an Efficiency Measure and a Performance Model
    Hu, Yue
    Koppelman, David M.
    Brandt, Steven Robert
    IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, 2018, 4 (03): : 477 - 490
  • [32] HppCnn: A High-Performance, Portable Deep-Learning Library for GPGPUs
    Yang, Yi
    Feng, Min
    Chakradhar, Srimat
    PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016, 2016, : 582 - 587
  • [33] Application Characteristics-Aware Sporadic Cache Bypassing for high performance GPGPUs
    Do, Cong Thuan
    Kim, Jong Myon
    Kim, Cheol Hong
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 122 : 238 - 250
  • [34] A Performance Study of an Anelastic Wave Propagation Code Using Auto-tuned Stencil Computations
    Christen, Matthias
    Schenk, Olaf
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012, 2012, 9 : 956 - 965
  • [35] Investigating Performance Losses in High-Level Synthesis for Stencil Computations
    Altoyan, Wesson
    Alonso, Juan J.
    28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2020, : 195 - 203
  • [36] Special issue: Advanced stencil-code engineering
    Lengauer, Christian
    Bolten, Matthias
    Falgout, Robert
    Schenk, Olaf
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (17):
  • [37] Locality-aware scheduling for stencil code in Halide
    Liao, Shih-wei
    Tsai, Sheng-Jun
    Yang, Chieh-Hsun
    Lo, Chen-Kang
    PROCEEDINGS OF 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW 2016), 2016, : 72 - 77
  • [38] Virtualizing high-end GPGPUs on ARM clusters for the next generation of high performance cloud computing
    Raffaele Montella
    Giulio Giunta
    Giuliano Laccetti
    Cluster Computing, 2014, 17 : 139 - 152
  • [39] Virtualizing high-end GPGPUs on ARM clusters for the next generation of high performance cloud computing
    Montella, Raffaele
    Giunta, Giulio
    Laccetti, Giuliano
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2014, 17 (01): : 139 - 152
  • [40] High-Performance High-Order Stencil Computation on FPGAs Using OpenCL
    Zohouri, Hamid Reza
    Podobas, Artur
    Matsuoka, Satoshi
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 123 - 130