A compression-based memory-efficient optimization for out-of-core GPU stencil computation

被引：1

作者：

Shen, Jingcheng ^{[1
]}

Long, Linbo ^{[1
]}

Deng, Xin ^{[1
]}

Okita, Masao ^{[2
]}

Ino, Fumihiko ^{[2
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, 2 Chongwen Rd, Chongqing 400065, Peoples R China

[2] Osaka Univ, Grad Sch Informat Sci & Technol, 1-5 Yamadaoka, Suita, Osaka 5650871, Japan

来源：

JOURNAL OF SUPERCOMPUTING | 2023年 / 79卷 / 10期

基金：

日本学术振兴会; 中国国家自然科学基金;

关键词：

On-the-fly compression; Stencil computation; Out-of-core; GPU; LOSSY COMPRESSION; ALGORITHM;

D O I：

10.1007/s11227-023-05103-8

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

A code for out-of-core stencil computation manages data that exceeds the memory capacity of a GPU. However, such a code necessitates frequent data transfers between the CPU and GPU, which often impede overall performance. In this work, we propose a compression-based, memory-efficient method to accelerate out-of-core stencil codes. First, an on-the-fly compression technique is integrated into the out of-core computation to reduce CPU-GPU data transfers. Secondly, a single-working buffer strategy is employed to reduce the GPU memory usage, enabling more data to be stored on the GPU for reuse, resulting in increased temporal blocking steps. Experimental results demonstrate that the proposed method significantly reduces the GPU memory usage by 21%, thereby creating space for doubling the number of temporal blocking steps compared to the codes without compression. Our proposed method has shown to help the high-order, data-transfer-bound stencil codes achieve speedups up to 2.09x for single-precision floating-point format and up to 1.92x for double-precision floating-point format on an NVIDIA Tesla V100 GPU in comparison with the codes without compression.

引用

页码：11055 / 11077

页数：23

共 42 条

[1] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
Jingcheng Shen
Linbo Long
Xin Deng
Masao Okita
Fumihiko Ino
[J]. The Journal of Supercomputing, 2023, 79 : 11055 - 11077
[2] Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression
Shen, Jingcheng
Wu, Yifan
Okita, Masao
Ino, Fumihiko
[J]. PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 3 - 14
[3] A Data-Centric Directive-Based Framework to Accelerate Out-of-Core Stencil Computation on a GPU
Shen, Jingcheng
Ino, Fumihiko
Farres, Albert
Hanzich, Mauricio
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (12): : 2421 - 2434
[4] A data-centric directive-based framework to accelerate out-of-core stencil computation on a GPU
Shen, Jingcheng
Ino, Fumihiko
Farrés, Albert
Hanzich, Mauricio
[J]. IEICE Transactions on Information and Systems, 2020, E103D (12): : 2421 - 2434
[5] An Extension of OpenACC Directives for Out-of-Core Stencil Computation with Temporal Blocking
Miki, Nobuhiro
Ino, Fumihiko
Hagihara, Kenichi
[J]. PROCEEDINGS OF WACCPD 2016: THIRD WORKSHOP ON ACCELERATOR PROGRAMMING USING DIRECTIVES, 2016, : 36 - 45
[6] Evaluation of Flash-based Out-of-core Stencil Computation Algorithms for SSD-Equipped Clusters
Midorikawa, Hiroko
Tan, Hideyuki
[J]. 2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 1031 - 1040
[7] AN EFFICIENT OUT-OF-CORE VOLUME RENDERING METHOD BASED ON RAY CASTING AND GPU ACCELERATION
Xue, Jian
Lue, Ke
Tian, Jie
[J]. 2009 IEEE YOUTH CONFERENCE ON INFORMATION, COMPUTING AND TELECOMMUNICATION, PROCEEDINGS, 2009, : 130 - +
[8] Efficient Utilization of Memory Hierarchy to Enable the Computation on Bigger Domains for Stencil Computation in CPU-GPU Based Systems
Jin, Guanghao
Lin, James
Endo, Toshio
[J]. 2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
[9] A Parallel Memory Efficient Framework for Out-of-Core Mesh simplification
Lu Yongquan
Li Nan
Gao Pengdong
Qiu Chu
Wang Jintao
Lv Rui
[J]. HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 666 - 671
[10] FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks
Patel, Shaurya
Liu, Tongping
Guan, Hui
[J]. PROCEEDINGS OF MCHPC 2021: WORKSHOP ON MEMORY CENTRIC HIGH PERFORMANCE COMPUTING, 2021, : 1 - 8

← 1 2 3 4 5 →