A Parallel Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPUs

被引:0
|
作者
Jin, Guanghao [1 ]
Endo, Toshio [1 ]
Matsuoka, Satoshi [2 ]
机构
[1] Tokyo Inst Technol, JST CREST, Tokyo 152, Japan
[2] Tokyo Inst Technol, JST CREST, NII, Tokyo, Japan
关键词
stencil computation; GPU cluster; memory capacity; parallel optimization; temporal blocking;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The problem size of the stencil computation on GPU cluster is limited by the memory capacity GPUs, which is typically smaller than that of host memories. This paper proposes and evaluates parallel optimization method for stencil computation to achieve scalability, larger problem size than the memory capacity of GPUs and high performance. It uses 2D decomposition to achieve scalability over GPUs. Then it enables bigger sub-domain on each GPU to achieve bigger problem size. It applies temporal blocking method to improve memory access locality of stencil computation and reuses former result to solve redundant problem to get higher performance. Evaluation of stencil simulation on 3D domain shows that our new method for 7-point and 19-point on GPUs achieves good scalability which is 1.45 times and 1.72 times better than other methods on average.
引用
收藏
页数:8
相关论文
共 29 条
  • [1] Efficient Utilization of Memory Hierarchy to Enable the Computation on Bigger Domains for Stencil Computation in CPU-GPU Based Systems
    Jin, Guanghao
    Lin, James
    Endo, Toshio
    [J]. 2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
  • [2] ACCELERATING STENCIL COMPUTATION ON GPGPU BY NOVEL MAPPING METHOD BETWEEN THE GLOBAL MEMORY AND THE SHARED MEMORY
    Mo, Tieqiang
    Li, Renfa
    [J]. COMPUTING AND INFORMATICS, 2018, 37 (03) : 533 - 552
  • [3] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
    Jingcheng Shen
    Linbo Long
    Xin Deng
    Masao Okita
    Fumihiko Ino
    [J]. The Journal of Supercomputing, 2023, 79 : 11055 - 11077
  • [4] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
    Shen, Jingcheng
    Long, Linbo
    Deng, Xin
    Okita, Masao
    Ino, Fumihiko
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11055 - 11077
  • [5] A Parallel Computation of a Characteristic Curve Method in a Domain Decomposition System
    Yao, Q.
    Ogino, M.
    Kanayama, H.
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING, 2011, 95
  • [6] Enhanced computation method of topological smoothing on shared memory parallel machines
    Ramzi Mahmoudi
    Mohamed Akil
    [J]. EURASIP Journal on Image and Video Processing, 2011
  • [7] Enhanced computation method of topological smoothing on shared memory parallel machines
    Mahmoudi, Ramzi
    Akil, Mohamed
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2011,
  • [8] An Implementation Method of Parallel Finite Element Computation Based on Overlapping Domain Decomposition
    Zhang, Jianfei
    Zhang, Lei
    Jiang, Hongdao
    [J]. HIGH PERFORMANCE COMPUTING AND APPLICATIONS, 2010, 5938 : 563 - +
  • [9] Seakeeping computation of two parallel ships with Rankine source panel method in frequency domain
    Yao, C. B.
    Sun, X. S.
    Liu, W. M.
    Feng, D. K.
    [J]. ENGINEERING ANALYSIS WITH BOUNDARY ELEMENTS, 2019, 109 : 70 - 80
  • [10] Overlapping communication and computation of GPU/CPU heterogeneous parallel spatial domain decomposition MOC method
    Liang, Liang
    Zhang, Qian
    Song, Peitao
    Zhang, Zhijian
    Zhao, Qiang
    Wu, Hongchun
    Cao, Liangzhi
    [J]. ANNALS OF NUCLEAR ENERGY, 2020, 135