Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs

被引：6

作者：

Nasciutti, Thiago Carrijo ^{[1
]}

Panetta, Jairo ^{[1
]}

Lopes, Pedro Pais ^{[1
]}

机构：

[1] ITA, Div Ciencia Comp, Sao Jose Dos Campos, SP, Brazil

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2019年 / 31卷 / 18期

基金：

欧盟地平线“2020”;

关键词：

GPGPU; memory hierarchy; stencil computation;

D O I：

10.1002/cpe.4929

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

This work compares the performance of optimizations that transform replicated global memory accesses into local memory accesses on 3D stencil computations in the NVIDIA Tesla K80 GPGPU. The optimizations reduce global memory contention caused by the set of multiprocessors. Evaluated optimizations are grid tiling, inserting spatial and temporal loops into kernels, register reuse, and some of their combinations. A standardized experiment evaluates performance variation with grid size and stencil size for each optimization. Experimental data show that codes that use these optimizations are up to 3.3 times faster than the classical stencil formulation. It also shows that the most profitable optimization varies with grid and stencil sizes.

引用

页数：16

共 46 条

[1] A new memory mapping mechanism for GPGPUs’ stencil computation
Tieqiang Mo
Renfa Li
[J]. Computing, 2015, 97 : 795 - 812
[2] A new memory mapping mechanism for GPGPUs' stencil computation
Mo, Tieqiang
Li, Renfa
[J]. COMPUTING, 2015, 97 (08) : 795 - 812
[3] Tiling Optimizations for Stencil Computations Using Rewrite Rules in LIFT
Stoltzfus, Larisa
Hagedorn, Bastian
Steuwer, Michel
Gorlatch, Sergei
Dubach, Christophe
[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 16 (04)
[4] PADS: A Pattern-Driven Stencil Compiler-Based Tool for Reuse of Optimizations on GPGPUs
Han, Dongni
Xu, Shixiong
Chen, Li
Huang, Lei
[J]. 2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 308 - 315
[5] The memory behavior of cache oblivious stencil computations
Matteo Frigo
Volker Strumpen
[J]. The Journal of Supercomputing, 2007, 39 : 93 - 112
[6] The memory behavior of cache oblivious stencil computations
Frigo, Matteo
Strumpen, Volker
[J]. JOURNAL OF SUPERCOMPUTING, 2007, 39 (02): : 93 - 112
[7] Evaluating the impact of cache optimizations for stencil operations with two programming paradigms
Bassetti, F
Reggiani, M
[J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2101 - 2107
[8] Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern
Aldinucci, Marco
Pezzi, Guilherme Peretti
Drocco, Maurizio
Spampinato, Concetto
Torquati, Massimo
[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2015, 29 (04): : 461 - 472
[9] PIMS: A Lightweight Processing-in-Memory Accelerator for Stencil Computations
Li, Jie
Wang, Xi
Tumeo, Antonino
Williams, Brody
Leidel, John D.
Chen, Yong
[J]. MEMSYS 2019: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2019, : 41 - 52
[10] A Distributed Memory Based Embedded CGRA for Accelerating Stencil Computations
Takeuchi, Shohei
Yuttakonkit, Yuttakon
Takamaeda-Yamazaki, Shinya
Nakashima, Yasuhiko
[J]. PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, : 385 - 391

← 1 2 3 4 5 →