Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs

被引:6
|
作者
Nasciutti, Thiago Carrijo [1 ]
Panetta, Jairo [1 ]
Lopes, Pedro Pais [1 ]
机构
[1] ITA, Div Ciencia Comp, Sao Jose Dos Campos, SP, Brazil
来源
基金
欧盟地平线“2020”;
关键词
GPGPU; memory hierarchy; stencil computation;
D O I
10.1002/cpe.4929
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This work compares the performance of optimizations that transform replicated global memory accesses into local memory accesses on 3D stencil computations in the NVIDIA Tesla K80 GPGPU. The optimizations reduce global memory contention caused by the set of multiprocessors. Evaluated optimizations are grid tiling, inserting spatial and temporal loops into kernels, register reuse, and some of their combinations. A standardized experiment evaluates performance variation with grid size and stencil size for each optimization. Experimental data show that codes that use these optimizations are up to 3.3 times faster than the classical stencil formulation. It also shows that the most profitable optimization varies with grid and stencil sizes.
引用
收藏
页数:16
相关论文
共 46 条
  • [21] COMMUNICATION OPTIMIZATIONS FOR IRREGULAR SCIENTIFIC COMPUTATIONS ON DISTRIBUTED-MEMORY ARCHITECTURES
    DAS, R
    UYSAL, M
    SALTZ, J
    HWANG, YS
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 22 (03) : 462 - 478
  • [22] Global optimizations and tabu search based on memory
    Ji, MJ
    Tang, HW
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2004, 159 (02) : 449 - 457
  • [23] NARMADA: Near-memory horizontal diffusion accelerator for scalable stencil computations
    Singh, Gagandeep
    Diamantopoulos, Dionysios
    Hagleitner, Christoph
    Stuijk, Sander
    Corporaal, Henk
    [J]. 2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 263 - 269
  • [24] Pipelined CPU-GPU Scheduling to Reduce Main Memory Accesses
    Gerzhoy, Daniel
    Yeung, Donald
    [J]. PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, MEMSYS 2021, 2021,
  • [25] Locality-Aware Stencil Computations using Flash SSDs as Main Memory Extension
    Midorikawa, Hiroko
    Tan, Hideyuki
    [J]. 2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 1163 - 1168
  • [26] High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory
    Du, Changdao
    Yamaguchi, Yoshiki
    [J]. ELECTRONICS, 2020, 9 (08) : 1 - 19
  • [27] Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model
    Stengel, Holger
    Treibig, Jan
    Hager, Georg
    Wellein, Gerhard
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 207 - 216
  • [28] Promising 2.0: Global Optimizations in Relaxed Memory Concurrency
    Lee, Sung-Hwan
    Cho, Minki
    Podkopaev, Anton
    Chakraborty, Soham
    Hur, Chung-Kil
    Lahav, Ori
    Vafeiadis, Viktor
    [J]. PROCEEDINGS OF THE 41ST ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '20), 2020, : 362 - 376
  • [29] DNA computations can have global memory
    Lipton, RJ
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER DESIGN - VLSI IN COMPUTERS AND PROCESSORS, PROCEEDINGS, 1996, : 344 - 347
  • [30] Instruction combining for coalescing memory accesses using global code motion
    Kawahito, Motohiro
    Komatsu, Hideaki
    Nakatani, Toshio
    [J]. Proc. ACM SIGPLAN Workshop Mem. Syst. Perform., MSP, 1600, (2-11):