Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs

被引:6
|
作者
Nasciutti, Thiago Carrijo [1 ]
Panetta, Jairo [1 ]
Lopes, Pedro Pais [1 ]
机构
[1] ITA, Div Ciencia Comp, Sao Jose Dos Campos, SP, Brazil
来源
基金
欧盟地平线“2020”;
关键词
GPGPU; memory hierarchy; stencil computation;
D O I
10.1002/cpe.4929
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This work compares the performance of optimizations that transform replicated global memory accesses into local memory accesses on 3D stencil computations in the NVIDIA Tesla K80 GPGPU. The optimizations reduce global memory contention caused by the set of multiprocessors. Evaluated optimizations are grid tiling, inserting spatial and temporal loops into kernels, register reuse, and some of their combinations. A standardized experiment evaluates performance variation with grid size and stencil size for each optimization. Experimental data show that codes that use these optimizations are up to 3.3 times faster than the classical stencil formulation. It also shows that the most profitable optimization varies with grid and stencil sizes.
引用
收藏
页数:16
相关论文
共 46 条
  • [1] A new memory mapping mechanism for GPGPUs’ stencil computation
    Tieqiang Mo
    Renfa Li
    [J]. Computing, 2015, 97 : 795 - 812
  • [2] A new memory mapping mechanism for GPGPUs' stencil computation
    Mo, Tieqiang
    Li, Renfa
    [J]. COMPUTING, 2015, 97 (08) : 795 - 812
  • [3] Tiling Optimizations for Stencil Computations Using Rewrite Rules in LIFT
    Stoltzfus, Larisa
    Hagedorn, Bastian
    Steuwer, Michel
    Gorlatch, Sergei
    Dubach, Christophe
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 16 (04)
  • [4] PADS: A Pattern-Driven Stencil Compiler-Based Tool for Reuse of Optimizations on GPGPUs
    Han, Dongni
    Xu, Shixiong
    Chen, Li
    Huang, Lei
    [J]. 2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 308 - 315
  • [5] The memory behavior of cache oblivious stencil computations
    Matteo Frigo
    Volker Strumpen
    [J]. The Journal of Supercomputing, 2007, 39 : 93 - 112
  • [6] The memory behavior of cache oblivious stencil computations
    Frigo, Matteo
    Strumpen, Volker
    [J]. JOURNAL OF SUPERCOMPUTING, 2007, 39 (02): : 93 - 112
  • [7] Evaluating the impact of cache optimizations for stencil operations with two programming paradigms
    Bassetti, F
    Reggiani, M
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2101 - 2107
  • [8] Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern
    Aldinucci, Marco
    Pezzi, Guilherme Peretti
    Drocco, Maurizio
    Spampinato, Concetto
    Torquati, Massimo
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2015, 29 (04): : 461 - 472
  • [9] PIMS: A Lightweight Processing-in-Memory Accelerator for Stencil Computations
    Li, Jie
    Wang, Xi
    Tumeo, Antonino
    Williams, Brody
    Leidel, John D.
    Chen, Yong
    [J]. MEMSYS 2019: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2019, : 41 - 52
  • [10] A Distributed Memory Based Embedded CGRA for Accelerating Stencil Computations
    Takeuchi, Shohei
    Yuttakonkit, Yuttakon
    Takamaeda-Yamazaki, Shinya
    Nakashima, Yasuhiko
    [J]. PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, : 385 - 391