A Parallel Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPUs

被引：0

作者：

Jin, Guanghao ^{[1
]}

Endo, Toshio ^{[1
]}

Matsuoka, Satoshi ^{[2
]}

机构：

[1] Tokyo Inst Technol, JST CREST, Tokyo 152, Japan

[2] Tokyo Inst Technol, JST CREST, NII, Tokyo, Japan

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) | 2013年

关键词：

stencil computation; GPU cluster; memory capacity; parallel optimization; temporal blocking;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The problem size of the stencil computation on GPU cluster is limited by the memory capacity GPUs, which is typically smaller than that of host memories. This paper proposes and evaluates parallel optimization method for stencil computation to achieve scalability, larger problem size than the memory capacity of GPUs and high performance. It uses 2D decomposition to achieve scalability over GPUs. Then it enables bigger sub-domain on each GPU to achieve bigger problem size. It applies temporal blocking method to improve memory access locality of stencil computation and reuses former result to solve redundant problem to get higher performance. Evaluation of stencil simulation on 3D domain shows that our new method for 7-point and 19-point on GPUs achieves good scalability which is 1.45 times and 1.72 times better than other methods on average.

引用

页数：8

共 29 条

[1] Efficient Utilization of Memory Hierarchy to Enable the Computation on Bigger Domains for Stencil Computation in CPU-GPU Based Systems
Jin, Guanghao
Lin, James
Endo, Toshio
2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
[2] ACCELERATING STENCIL COMPUTATION ON GPGPU BY NOVEL MAPPING METHOD BETWEEN THE GLOBAL MEMORY AND THE SHARED MEMORY
Mo, Tieqiang
Li, Renfa
COMPUTING AND INFORMATICS, 2018, 37 (03) : 533 - 552
[3] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
Jingcheng Shen
Linbo Long
Xin Deng
Masao Okita
Fumihiko Ino
The Journal of Supercomputing, 2023, 79 : 11055 - 11077
[4] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
Shen, Jingcheng
Long, Linbo
Deng, Xin
Okita, Masao
Ino, Fumihiko
JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11055 - 11077
[5] A Parallel Computation of a Characteristic Curve Method in a Domain Decomposition System
Yao, Q.
Ogino, M.
Kanayama, H.
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING, 2011, 95
[6] Enhanced computation method of topological smoothing on shared memory parallel machines
Ramzi Mahmoudi
Mohamed Akil
EURASIP Journal on Image and Video Processing, 2011
[7] Enhanced computation method of topological smoothing on shared memory parallel machines
Mahmoudi, Ramzi
Akil, Mohamed
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2011,
[8] An Implementation Method of Parallel Finite Element Computation Based on Overlapping Domain Decomposition
Zhang, Jianfei
Zhang, Lei
Jiang, Hongdao
HIGH PERFORMANCE COMPUTING AND APPLICATIONS, 2010, 5938 : 563 - +
[9] Seakeeping computation of two parallel ships with Rankine source panel method in frequency domain
Yao, C. B.
Sun, X. S.
Liu, W. M.
Feng, D. K.
ENGINEERING ANALYSIS WITH BOUNDARY ELEMENTS, 2019, 109 : 70 - 80
[10] Overlapping communication and computation of GPU/CPU heterogeneous parallel spatial domain decomposition MOC method
Liang, Liang
Zhang, Qian
Song, Peitao
Zhang, Zhijian
Zhao, Qiang
Wu, Hongchun
Cao, Liangzhi
ANNALS OF NUCLEAR ENERGY, 2020, 135

← 1 2 3 →