A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs

被引:0
|
作者
杨杨 [1 ,2 ]
崔慧敏 [1 ,2 ]
冯晓兵 [1 ]
薛京灵 [3 ]
机构
[1] State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences
[2] Graduate University of Chinese Academy of Sciences
[3] Programming Languages and Compilers Group, School of Computer Science and Engineering University of New South Wales
基金
中国国家自然科学基金;
关键词
stencil computation; circular queue; GPU; occupancy; register;
D O I
暂无
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only.
引用
收藏
页码:57 / 74
页数:18
相关论文
共 50 条
  • [1] A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
    Yang, Yang
    Cui, Hui-Min
    Feng, Xiao-Bing
    Xue, Jing-Ling
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2012, 27 (01) : 57 - 74
  • [2] A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
    Yang Yang
    Hui-Min Cui
    Xiao-Bing Feng
    Jing-Ling Xue
    [J]. Journal of Computer Science and Technology, 2012, 27 : 57 - 74
  • [3] A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
    杨杨
    崔慧敏
    冯晓兵
    薛京灵
    [J]. Journal of Computer Science & Technology., 2012, 27 (01) - 74
  • [4] TOAST: Automatic tiling for iterative stencil computations on GPUs
    Rocha, Rodrigo C. O.
    Pereira, Alyson D.
    Ramos, Luiz
    Goes, Luis F. W.
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (08):
  • [5] EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs
    Manuel de Castro
    Inmaculada Santamaria-Valenzuela
    Yuri Torres
    Arturo Gonzalez-Escribano
    Diego R. Llanos
    [J]. The Journal of Supercomputing, 2023, 79 : 9409 - 9442
  • [6] EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs
    de Castro, Manuel
    Santamaria-Valenzuela, Inmaculada
    Torres, Yuri
    Gonzalez-Escribano, Arturo
    Llanos, Diego R.
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (09): : 9409 - 9442
  • [7] Register Caching for Stencil Computations on GPUs
    Falch, Thomas L.
    Elster, Anne C.
    [J]. 16TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2014), 2014, : 479 - 486
  • [8] Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE
    Cecilia, Jose M.
    Abellan, Jose L.
    Fernandez, Juan
    Acacio, Manuel E.
    Garcia, Jose M.
    Ujaldon, Manuel
    [J]. JOURNAL OF SUPERCOMPUTING, 2012, 62 (02): : 787 - 803
  • [9] Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE
    José M. Cecilia
    José L. Abellán
    Juan Fernández
    Manuel E. Acacio
    José M. García
    Manuel Ujaldón
    [J]. The Journal of Supercomputing, 2012, 62 : 787 - 803
  • [10] Double precision stencil computations on Kepler GPUs
    Vizitiu, Anamaria
    Itu, Lucian
    Lazar, Laszlo
    Suciu, Constantin
    [J]. 2014 18TH INTERNATIONAL CONFERENCE SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2014, : 123 - 127