A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs

被引：0

作者：

杨杨 ^{[1
,2
]}

崔慧敏 ^{[1
,2
]}

冯晓兵 ^{[1
]}

薛京灵 ^{[3
]}

机构：

[1] State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences

[2] Graduate University of Chinese Academy of Sciences

[3] Programming Languages and Compilers Group, School of Computer Science and Engineering University of New South Wales

来源：

Journal of Computer Science & Technology | 2012年 / 01期

基金：

中国国家自然科学基金;

关键词：

stencil computation; circular queue; GPU; occupancy; register;

D O I：

暂无

中图分类号：

TP391.41 [];

学科分类号：

080203 ;

摘要：

In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only.

引用

页码：57 / 74

页数：18

共 50 条

[21] Performance Modeling and Automatic Ghost Zone Optimization for Iterative Stencil Loops on GPUs
Meng, Jiayuan
Skadron, Kevin
[J]. ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2009, : 256 - 265
[22] NUMA Aware Iterative Stencil Computations on Many-Core Systems
Shaheen, Mohammed
Strzodka, Robert
[J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 461 - 473
[23] Automatic Code Generation for Iterative Multi-dimensional Stencil Computations
Saied, Mariem
Gustedt, Jens
Muller, Gilles
[J]. PROCEEDINGS OF 2016 IEEE 23RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2016, : 280 - 289
[24] Efficient multicore-aware parallelization strategies for iterative stencil computations
Treibig, Jan
Wellein, Gerhard
Hager, Georg
[J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2011, 2 (02) : 130 - 137
[25] ALGEBRAIC MULTIGRID USING A STENCIL-CSR HYBRID FORMAT ON GPUS
Boukhris, Siham
Napov, Artem
Notay, Yvan
[J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2023, 45 (03): : C154 - C178
[26] DHTS: A Dynamic Hybrid Tiling Strategy for Optimizing Stencil Computation on GPUs
Liu, Song
Zhang, Zengyuan
Wu, Weiguo
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (10) : 2795 - 2807
[27] CUDA 2D Stencil Computations for the Jacobi Method
Maria Cecilia, Jose
Manuel Garcia, Jose
Ujaldon, Manuel
[J]. APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT I, 2012, 7133 : 173 - 183
[28] CUDA 2D stencil computations for the Jacobi method
Computer Engineering and Technology Department, University of Murcia, Spain
不详
[J]. Lect. Notes Comput. Sci., PART 1 (173-183):
[29] Energy-efficient Stencil Computations on Distributed GPUs using Dynamic Parallelism and GPU-controlled Communication
Oden, Lena
Klenk, Benjamin
Froening, Holger
[J]. 2014 ENERGY EFFICIENT SUPERCOMPUTING WORKSHOP (E2SC), 2014, : 31 - 40
[30] Low Byte/Flop Implementation of Iterative Solver for Sparse Matrices Derived from Stencil Computations
Ono, Kenji
Chiba, Shuichi
Inoue, Shunsuke
Minami, Kazuo
[J]. HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2014, 2015, 8969 : 192 - 205

← 1 2 3 4 5 →