A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs

被引:0
|
作者
杨杨 [1 ,2 ]
崔慧敏 [1 ,2 ]
冯晓兵 [1 ]
薛京灵 [3 ]
机构
[1] State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences
[2] Graduate University of Chinese Academy of Sciences
[3] Programming Languages and Compilers Group, School of Computer Science and Engineering University of New South Wales
基金
中国国家自然科学基金;
关键词
stencil computation; circular queue; GPU; occupancy; register;
D O I
暂无
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only.
引用
收藏
页码:57 / 74
页数:18
相关论文
共 50 条
  • [31] Stencil coefficient computations for the multiresolution time domain method - A filterbank approach
    Vaitheeswaran, S. M.
    Narasimhan, S. V.
    [J]. PROGRESS IN ELECTROMAGNETICS RESEARCH-PIER, 2008, 81 : 149 - 166
  • [32] One Size Does Not Fit All: Implementation Trade-Offs for Iterative Stencil Computations on FPGAs
    Deest, Gael
    Yuki, Tomofumi
    Rajopadhye, Sanjay
    Derrien, Steven
    [J]. 2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,
  • [33] A Parallel Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPUs
    Jin, Guanghao
    Endo, Toshio
    Matsuoka, Satoshi
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
  • [34] SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs
    Muhammed, Thaha
    Mehmood, Rashid
    Albeshri, Aiiad
    Katib, Iyad
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (05):
  • [35] The 3D vortex particle method in parallel computations on many GPUs
    Kosior, Andrzej
    Kudela, Henryk
    [J]. COMPUTERS & FLUIDS, 2014, 92 : 274 - 280
  • [36] A tuning approach for iterative multiple 3d stencil pipeline on GPUs: Anisotropic Nonlinear Diffusion algorithm as case study
    S. Tabik
    M. Peemen
    L. F. Romero
    [J]. The Journal of Supercomputing, 2018, 74 : 1580 - 1608
  • [37] A tuning approach for iterative multiple 3d stencil pipeline on GPUs: Anisotropic Nonlinear Diffusion algorithm as case study
    Tabik, S.
    Peemen, M.
    Romero, L. F.
    [J]. JOURNAL OF SUPERCOMPUTING, 2018, 74 (04): : 1580 - 1608
  • [38] A Combinatorial Multigrid Preconditioned Iterative Method for Large Scale Circuit Simulation on GPUs
    Garyfallou, Dimitrios
    Evmorfopoulos, Nestor
    Stamoulis, Georgios
    [J]. 15TH INTERNATIONAL CONFERENCE ON SYNTHESIS, MODELING, ANALYSIS AND SIMULATION METHODS AND APPLICATIONS TO CIRCUIT DESIGN (SMACD 2018), 2018, : 209 - 212
  • [39] An iterative hybrid method for image interpolation
    Tian, Y
    Zhang, CF
    Peng, FY
    Zheng, S
    [J]. ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 : 10 - 19
  • [40] Improved fixed point iterative method for blade element momentum computations
    Sun, Zhenye
    Shen, Wen Zhong
    Chen, Jin
    Zhu, Wei Jun
    [J]. WIND ENERGY, 2017, 20 (09) : 1585 - 1600