Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs

被引：6

作者：

Nasciutti, Thiago Carrijo ^{[1
]}

Panetta, Jairo ^{[1
]}

Lopes, Pedro Pais ^{[1
]}

机构：

[1] ITA, Div Ciencia Comp, Sao Jose Dos Campos, SP, Brazil

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2019年 / 31卷 / 18期

基金：

欧盟地平线“2020”;

关键词：

GPGPU; memory hierarchy; stencil computation;

D O I：

10.1002/cpe.4929

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

This work compares the performance of optimizations that transform replicated global memory accesses into local memory accesses on 3D stencil computations in the NVIDIA Tesla K80 GPGPU. The optimizations reduce global memory contention caused by the set of multiprocessors. Evaluated optimizations are grid tiling, inserting spatial and temporal loops into kernels, register reuse, and some of their combinations. A standardized experiment evaluates performance variation with grid size and stencil size for each optimization. Experimental data show that codes that use these optimizations are up to 3.3 times faster than the classical stencil formulation. It also shows that the most profitable optimization varies with grid and stencil sizes.

引用

页数：16

共 46 条

[21] COMMUNICATION OPTIMIZATIONS FOR IRREGULAR SCIENTIFIC COMPUTATIONS ON DISTRIBUTED-MEMORY ARCHITECTURES
DAS, R
UYSAL, M
SALTZ, J
HWANG, YS
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 22 (03) : 462 - 478
[22] Global optimizations and tabu search based on memory
Ji, MJ
Tang, HW
[J]. APPLIED MATHEMATICS AND COMPUTATION, 2004, 159 (02) : 449 - 457
[23] NARMADA: Near-memory horizontal diffusion accelerator for scalable stencil computations
Singh, Gagandeep
Diamantopoulos, Dionysios
Hagleitner, Christoph
Stuijk, Sander
Corporaal, Henk
[J]. 2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 263 - 269
[24] Pipelined CPU-GPU Scheduling to Reduce Main Memory Accesses
Gerzhoy, Daniel
Yeung, Donald
[J]. PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, MEMSYS 2021, 2021,
[25] Locality-Aware Stencil Computations using Flash SSDs as Main Memory Extension
Midorikawa, Hiroko
Tan, Hideyuki
[J]. 2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 1163 - 1168
[26] High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory
Du, Changdao
Yamaguchi, Yoshiki
[J]. ELECTRONICS, 2020, 9 (08) : 1 - 19
[27] Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model
Stengel, Holger
Treibig, Jan
Hager, Georg
Wellein, Gerhard
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 207 - 216
[28] Promising 2.0: Global Optimizations in Relaxed Memory Concurrency
Lee, Sung-Hwan
Cho, Minki
Podkopaev, Anton
Chakraborty, Soham
Hur, Chung-Kil
Lahav, Ori
Vafeiadis, Viktor
[J]. PROCEEDINGS OF THE 41ST ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '20), 2020, : 362 - 376
[29] DNA computations can have global memory
Lipton, RJ
[J]. INTERNATIONAL CONFERENCE ON COMPUTER DESIGN - VLSI IN COMPUTERS AND PROCESSORS, PROCEEDINGS, 1996, : 344 - 347
[30] Instruction combining for coalescing memory accesses using global code motion
Kawahito, Motohiro
Komatsu, Hideaki
Nakatani, Toshio
[J]. Proc. ACM SIGPLAN Workshop Mem. Syst. Perform., MSP, 1600, (2-11):

← 1 2 3 4 5 →