A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs

被引:0
|
作者
杨杨 [1 ,2 ]
崔慧敏 [1 ,2 ]
冯晓兵 [1 ]
薛京灵 [3 ]
机构
[1] State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences
[2] Graduate University of Chinese Academy of Sciences
[3] Programming Languages and Compilers Group, School of Computer Science and Engineering University of New South Wales
基金
中国国家自然科学基金;
关键词
stencil computation; circular queue; GPU; occupancy; register;
D O I
暂无
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only.
引用
收藏
页码:57 / 74
页数:18
相关论文
共 50 条
  • [21] Performance Modeling and Automatic Ghost Zone Optimization for Iterative Stencil Loops on GPUs
    Meng, Jiayuan
    Skadron, Kevin
    [J]. ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2009, : 256 - 265
  • [22] NUMA Aware Iterative Stencil Computations on Many-Core Systems
    Shaheen, Mohammed
    Strzodka, Robert
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 461 - 473
  • [23] Automatic Code Generation for Iterative Multi-dimensional Stencil Computations
    Saied, Mariem
    Gustedt, Jens
    Muller, Gilles
    [J]. PROCEEDINGS OF 2016 IEEE 23RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2016, : 280 - 289
  • [24] Efficient multicore-aware parallelization strategies for iterative stencil computations
    Treibig, Jan
    Wellein, Gerhard
    Hager, Georg
    [J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2011, 2 (02) : 130 - 137
  • [25] ALGEBRAIC MULTIGRID USING A STENCIL-CSR HYBRID FORMAT ON GPUS
    Boukhris, Siham
    Napov, Artem
    Notay, Yvan
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2023, 45 (03): : C154 - C178
  • [26] DHTS: A Dynamic Hybrid Tiling Strategy for Optimizing Stencil Computation on GPUs
    Liu, Song
    Zhang, Zengyuan
    Wu, Weiguo
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (10) : 2795 - 2807
  • [27] CUDA 2D Stencil Computations for the Jacobi Method
    Maria Cecilia, Jose
    Manuel Garcia, Jose
    Ujaldon, Manuel
    [J]. APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT I, 2012, 7133 : 173 - 183
  • [28] CUDA 2D stencil computations for the Jacobi method
    Computer Engineering and Technology Department, University of Murcia, Spain
    不详
    [J]. Lect. Notes Comput. Sci., PART 1 (173-183):
  • [29] Energy-efficient Stencil Computations on Distributed GPUs using Dynamic Parallelism and GPU-controlled Communication
    Oden, Lena
    Klenk, Benjamin
    Froening, Holger
    [J]. 2014 ENERGY EFFICIENT SUPERCOMPUTING WORKSHOP (E2SC), 2014, : 31 - 40
  • [30] Low Byte/Flop Implementation of Iterative Solver for Sparse Matrices Derived from Stencil Computations
    Ono, Kenji
    Chiba, Shuichi
    Inoue, Shunsuke
    Minami, Kazuo
    [J]. HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2014, 2015, 8969 : 192 - 205