A compression-based memory-efficient optimization for out-of-core GPU stencil computation

被引：0

作者：

Jingcheng Shen

Linbo Long

Xin Deng

Masao Okita

Fumihiko Ino

机构：

[1] Chongqing University of Posts and Telecommunications,College of Computer Science and Technology

[2] Osaka University,Graduate School of Information Science and Technology

来源：

The Journal of Supercomputing | 2023年 / 79卷

关键词：

On-the-fly compression; Stencil computation; Out-of-core; GPU;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

A code for out-of-core stencil computation manages data that exceeds the memory capacity of a GPU. However, such a code necessitates frequent data transfers between the CPU and GPU, which often impede overall performance. In this work, we propose a compression-based, memory-efficient method to accelerate out-of-core stencil codes. First, an on-the-fly compression technique is integrated into the out-of-core computation to reduce CPU-GPU data transfers. Secondly, a single-working-buffer strategy is employed to reduce the GPU memory usage, enabling more data to be stored on the GPU for reuse, resulting in increased temporal blocking steps. Experimental results demonstrate that the proposed method significantly reduces the GPU memory usage by 21%, thereby creating space for doubling the number of temporal blocking steps compared to the codes without compression. Our proposed method has shown to help the high-order, data-transfer-bound stencil codes achieve speedups up to 2.09×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2.09\times $$\end{document} for single-precision floating-point format and up to 1.92×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.92\times $$\end{document} for double-precision floating-point format on an NVIDIA Tesla V100 GPU in comparison with the codes without compression.

引用

页码：11055 / 11077

页数：22

共 42 条

[21] Efficient Swap Protocol of Remote Memory Paging for Out-of-Core Multi-thread Applications
Midorikawa, Hiroko
Kitagawa, Kenji
Ohura, Hikari
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 637 - 638
[22] Out-of-Core GPU-based Change Detection in Massive 3D Point Clouds
Richter, Rico
Kyprianidis, Jan Eric
Doellner, Juergen
[J]. TRANSACTIONS IN GIS, 2013, 17 (05) : 724 - 741
[23] Out-of-Core Solver Using GPU-Accelerated Cluster for MoM-Based EM Code
Zoric, Dusan P.
Olcan, Dragan I.
Kolundzija, Branko M.
[J]. 2014 8TH EUROPEAN CONFERENCE ON ANTENNAS AND PROPAGATION (EUCAP), 2014, : 1176 - +
[24] A Memory-Efficient Edge Inference Accelerator with XOR-based Model Compression
Lee, Hyunseung
Hong, Jihoon
Kim, Soosung
Lee, Seung Yul
Lee, Jae W.
[J]. 2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[25] Memory-Efficient Discrete Wavelet Transform Architecture Based on Wordlength Optimization
Hu, Yusong
Jong, Ching Chuen
[J]. 2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2015, : 1778 - 1781
[26] Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes
Lu, Yuechao
Ino, Fumihiko
Hagihara, Kenichi
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12): : 3060 - 3071
[27] An Efficient GPU-Based Out-of-Core LU Solver of Parallel Higher-Order Method of Moments for Solving Airborne Array Problems
Lin, Zhongchao
Chen, Yan
Zhang, Yu
Zhao, Xunwang
Zhang, Huanhuan
[J]. INTERNATIONAL JOURNAL OF ANTENNAS AND PROPAGATION, 2017, 2017
[28] Joint Index, Sorting, and Compression Optimization for Memory-Efficient Spatio-Temporal Data Management
Richly, Keven
Schlosser, Rainer
Boissier, Martin
[J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1901 - 1906
[29] Clustering compression-based computation-efficient calibration method for digital twin modeling of HVAC system
Lu, Jie
Tian, Xiangning
Feng, Chenxin
Zhang, Chaobo
Zhao, Yang
Zhang, Yiwen
Wang, Zihao
[J]. BUILDING SIMULATION, 2023, 16 (06) : 997 - 1012
[30] Clustering compression-based computation-efficient calibration method for digital twin modeling of HVAC system
Jie Lu
Xiangning Tian
Chenxin Feng
Chaobo Zhang
Yang Zhao
Yiwen Zhang
Zihao Wang
[J]. Building Simulation, 2023, 16 : 997 - 1012

← 1 2 3 4 5 →