High Performance Stencil Code Algorithms for GPGPUs

被引:34
|
作者
Schaefer, Andreas [1 ]
Fey, Dietmar [1 ]
机构
[1] Univ Erlangen Nurnberg, Chair Comp Sci Comp Architecture 3, D-91054 Erlangen, Germany
关键词
stencil codes; GPU; high performance computing; temporal blocking; Jacobi solver; CUDA;
D O I
10.1016/j.procs.2011.04.221
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper we investigate how stencil computations can be implemented on state-of-the-art general purpose graphics processing units (GPGPUs). Stencil codes can be found at the core of many numerical solvers and physical simulation codes and are therefore of particular interest to scientific computing research. GPGPUs have gained a lot of attention recently because of their superior floating point performance and memory bandwidth. Nevertheless, especially memory bound stencil codes have proven to be challenging for GPGPUs, yielding lower than to be expected speedups. We chose the Jacobi method as a standard benchmark to evaluate a set of algorithms on NVIDIA's latest Fermi chipset. One of our fastest algorithms is a parallel wavefront update. It exploits the enlarged on-chip shared memory to perform two time step updates per sweep. To the best of our knowledge, it represents the first successful application of temporal blocking for 3D stencils on GPGPUs and thereby exceeds previous results by a considerable margin. It is also the first paper to study stencil codes on Fermi.
引用
收藏
页码:2027 / 2036
页数:10
相关论文
共 50 条
  • [1] Performance limits study of stencil codes on modern GPGPUs
    Pershin, Ilya S.
    Levchenko, Vadim D.
    Perepelkina, Anastasia Y.
    [J]. Supercomputing Frontiers and Innovations, 2019, 6 (02) : 86 - 101
  • [2] High Performance Stencil Code Generation with LIFT
    Hagedorn, Bastian
    Stoltzfus, Larisa
    Steuwer, Michel
    Gorlatch, Sergei
    Dubach, Christophe
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'18), 2018, : 100 - 112
  • [3] Evaluation of Programming Models and Performance for Stencil Computation on GPGPUs
    Shan, Baodi
    Araya-Polo, Mauricio
    [J]. 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 1178 - 1180
  • [4] Understanding Stencil Code Performance On MultiCore Architectures
    Rahman, Shah M. Faizur
    Yi, Qing
    Qasem, Apan
    [J]. PROCEEDINGS OF THE 2011 8TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF 2011), 2011,
  • [5] A new memory mapping mechanism for GPGPUs’ stencil computation
    Tieqiang Mo
    Renfa Li
    [J]. Computing, 2015, 97 : 795 - 812
  • [6] A new memory mapping mechanism for GPGPUs' stencil computation
    Mo, Tieqiang
    Li, Renfa
    [J]. COMPUTING, 2015, 97 (08) : 795 - 812
  • [7] High Performance Code Generation for Stencil Computation on Heterogeneous Multi-device Architectures
    Li, Pei
    Brunet, Elisabeth
    Namyst, Raymond
    [J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1512 - 1518
  • [8] High Performance Parallel Graph Coloring on GPGPUs
    Li, Pingfan
    Chen, Xuhao
    Quan, Zhe
    Fang, Jianbin
    Su, Huayou
    Tang, Tao
    Yang, Canqun
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 845 - 854
  • [9] Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations
    Rawat, Prashant Singh
    Vaidya, Miheer
    Sukumaran-Rajam, Aravind
    Ravishankar, Mahesh
    Grover, Vinod
    Rountev, Atanas
    Pouchet, Louis-Noel
    Sadayappan, P.
    [J]. PROCEEDINGS OF THE IEEE, 2018, 106 (11) : 1902 - 1920
  • [10] Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs
    Nasciutti, Thiago Carrijo
    Panetta, Jairo
    Lopes, Pedro Pais
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (18):