A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs

被引：0

作者：

杨杨 ^{[1
,2
]}

崔慧敏 ^{[1
,2
]}

冯晓兵 ^{[1
]}

薛京灵 ^{[3
]}

机构：

[1] State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences

[2] Graduate University of Chinese Academy of Sciences

[3] Programming Languages and Compilers Group, School of Computer Science and Engineering University of New South Wales

来源：

Journal of Computer Science & Technology | 2012年 / 01期

基金：

中国国家自然科学基金;

关键词：

stencil computation; circular queue; GPU; occupancy; register;

D O I：

暂无

中图分类号：

TP391.41 [];

学科分类号：

080203 ;

摘要：

In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only.

引用

页码：57 / 74

页数：18

共 50 条

[1] A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
Yang, Yang
Cui, Hui-Min
Feng, Xiao-Bing
Xue, Jing-Ling
[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2012, 27 (01) : 57 - 74
[2] A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
Yang Yang
Hui-Min Cui
Xiao-Bing Feng
Jing-Ling Xue
[J]. Journal of Computer Science and Technology, 2012, 27 : 57 - 74
[3] A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
杨杨
崔慧敏
冯晓兵
薛京灵
[J]. Journal of Computer Science & Technology., 2012, 27 (01) - 74
[4] TOAST: Automatic tiling for iterative stencil computations on GPUs
Rocha, Rodrigo C. O.
Pereira, Alyson D.
Ramos, Luiz
Goes, Luis F. W.
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (08):
[5] EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs
Manuel de Castro
Inmaculada Santamaria-Valenzuela
Yuri Torres
Arturo Gonzalez-Escribano
Diego R. Llanos
[J]. The Journal of Supercomputing, 2023, 79 : 9409 - 9442
[6] EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs
de Castro, Manuel
Santamaria-Valenzuela, Inmaculada
Torres, Yuri
Gonzalez-Escribano, Arturo
Llanos, Diego R.
[J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (09): : 9409 - 9442
[7] Register Caching for Stencil Computations on GPUs
Falch, Thomas L.
Elster, Anne C.
[J]. 16TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2014), 2014, : 479 - 486
[8] Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE
Cecilia, Jose M.
Abellan, Jose L.
Fernandez, Juan
Acacio, Manuel E.
Garcia, Jose M.
Ujaldon, Manuel
[J]. JOURNAL OF SUPERCOMPUTING, 2012, 62 (02): : 787 - 803
[9] Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE
José M. Cecilia
José L. Abellán
Juan Fernández
Manuel E. Acacio
José M. García
Manuel Ujaldón
[J]. The Journal of Supercomputing, 2012, 62 : 787 - 803
[10] Double precision stencil computations on Kepler GPUs
Vizitiu, Anamaria
Itu, Lucian
Lazar, Laszlo
Suciu, Constantin
[J]. 2014 18TH INTERNATIONAL CONFERENCE SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2014, : 123 - 127

← 1 2 3 4 5 →