A compression-based memory-efficient optimization for out-of-core GPU stencil computation

被引：0

作者：

Jingcheng Shen

Linbo Long

Xin Deng

Masao Okita

Fumihiko Ino

机构：

[1] Chongqing University of Posts and Telecommunications,College of Computer Science and Technology

[2] Osaka University,Graduate School of Information Science and Technology

来源：

The Journal of Supercomputing | 2023年 / 79卷

关键词：

On-the-fly compression; Stencil computation; Out-of-core; GPU;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

A code for out-of-core stencil computation manages data that exceeds the memory capacity of a GPU. However, such a code necessitates frequent data transfers between the CPU and GPU, which often impede overall performance. In this work, we propose a compression-based, memory-efficient method to accelerate out-of-core stencil codes. First, an on-the-fly compression technique is integrated into the out-of-core computation to reduce CPU-GPU data transfers. Secondly, a single-working-buffer strategy is employed to reduce the GPU memory usage, enabling more data to be stored on the GPU for reuse, resulting in increased temporal blocking steps. Experimental results demonstrate that the proposed method significantly reduces the GPU memory usage by 21%, thereby creating space for doubling the number of temporal blocking steps compared to the codes without compression. Our proposed method has shown to help the high-order, data-transfer-bound stencil codes achieve speedups up to 2.09×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2.09\times $$\end{document} for single-precision floating-point format and up to 1.92×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.92\times $$\end{document} for double-precision floating-point format on an NVIDIA Tesla V100 GPU in comparison with the codes without compression.

引用

页码：11055 / 11077

页数：22

共 42 条

[1] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
Shen, Jingcheng
Long, Linbo
Deng, Xin
Okita, Masao
Ino, Fumihiko
[J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11055 - 11077
[2] Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression
Shen, Jingcheng
Wu, Yifan
Okita, Masao
Ino, Fumihiko
[J]. PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 3 - 14
[3] A Data-Centric Directive-Based Framework to Accelerate Out-of-Core Stencil Computation on a GPU
Shen, Jingcheng
Ino, Fumihiko
Farres, Albert
Hanzich, Mauricio
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (12): : 2421 - 2434
[4] A data-centric directive-based framework to accelerate out-of-core stencil computation on a GPU
Shen, Jingcheng
Ino, Fumihiko
Farrés, Albert
Hanzich, Mauricio
[J]. IEICE Transactions on Information and Systems, 2020, E103D (12): : 2421 - 2434
[5] An Extension of OpenACC Directives for Out-of-Core Stencil Computation with Temporal Blocking
Miki, Nobuhiro
Ino, Fumihiko
Hagihara, Kenichi
[J]. PROCEEDINGS OF WACCPD 2016: THIRD WORKSHOP ON ACCELERATOR PROGRAMMING USING DIRECTIVES, 2016, : 36 - 45
[6] Evaluation of Flash-based Out-of-core Stencil Computation Algorithms for SSD-Equipped Clusters
Midorikawa, Hiroko
Tan, Hideyuki
[J]. 2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 1031 - 1040
[7] AN EFFICIENT OUT-OF-CORE VOLUME RENDERING METHOD BASED ON RAY CASTING AND GPU ACCELERATION
Xue, Jian
Lue, Ke
Tian, Jie
[J]. 2009 IEEE YOUTH CONFERENCE ON INFORMATION, COMPUTING AND TELECOMMUNICATION, PROCEEDINGS, 2009, : 130 - +
[8] Efficient Utilization of Memory Hierarchy to Enable the Computation on Bigger Domains for Stencil Computation in CPU-GPU Based Systems
Jin, Guanghao
Lin, James
Endo, Toshio
[J]. 2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
[9] A Parallel Memory Efficient Framework for Out-of-Core Mesh simplification
Lu Yongquan
Li Nan
Gao Pengdong
Qiu Chu
Wang Jintao
Lv Rui
[J]. HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 666 - 671
[10] FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks
Patel, Shaurya
Liu, Tongping
Guan, Hui
[J]. PROCEEDINGS OF MCHPC 2021: WORKSHOP ON MEMORY CENTRIC HIGH PERFORMANCE COMPUTING, 2021, : 1 - 8

← 1 2 3 4 5 →