A compression-based memory-efficient optimization for out-of-core GPU stencil computation

被引:0
|
作者
Jingcheng Shen
Linbo Long
Xin Deng
Masao Okita
Fumihiko Ino
机构
[1] Chongqing University of Posts and Telecommunications,College of Computer Science and Technology
[2] Osaka University,Graduate School of Information Science and Technology
来源
关键词
On-the-fly compression; Stencil computation; Out-of-core; GPU;
D O I
暂无
中图分类号
学科分类号
摘要
A code for out-of-core stencil computation manages data that exceeds the memory capacity of a GPU. However, such a code necessitates frequent data transfers between the CPU and GPU, which often impede overall performance. In this work, we propose a compression-based, memory-efficient method to accelerate out-of-core stencil codes. First, an on-the-fly compression technique is integrated into the out-of-core computation to reduce CPU-GPU data transfers. Secondly, a single-working-buffer strategy is employed to reduce the GPU memory usage, enabling more data to be stored on the GPU for reuse, resulting in increased temporal blocking steps. Experimental results demonstrate that the proposed method significantly reduces the GPU memory usage by 21%, thereby creating space for doubling the number of temporal blocking steps compared to the codes without compression. Our proposed method has shown to help the high-order, data-transfer-bound stencil codes achieve speedups up to 2.09×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2.09\times $$\end{document} for single-precision floating-point format and up to 1.92×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.92\times $$\end{document} for double-precision floating-point format on an NVIDIA Tesla V100 GPU in comparison with the codes without compression.
引用
收藏
页码:11055 / 11077
页数:22
相关论文
共 42 条
  • [21] Efficient Swap Protocol of Remote Memory Paging for Out-of-Core Multi-thread Applications
    Midorikawa, Hiroko
    Kitagawa, Kenji
    Ohura, Hikari
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 637 - 638
  • [22] Out-of-Core GPU-based Change Detection in Massive 3D Point Clouds
    Richter, Rico
    Kyprianidis, Jan Eric
    Doellner, Juergen
    [J]. TRANSACTIONS IN GIS, 2013, 17 (05) : 724 - 741
  • [23] Out-of-Core Solver Using GPU-Accelerated Cluster for MoM-Based EM Code
    Zoric, Dusan P.
    Olcan, Dragan I.
    Kolundzija, Branko M.
    [J]. 2014 8TH EUROPEAN CONFERENCE ON ANTENNAS AND PROPAGATION (EUCAP), 2014, : 1176 - +
  • [24] A Memory-Efficient Edge Inference Accelerator with XOR-based Model Compression
    Lee, Hyunseung
    Hong, Jihoon
    Kim, Soosung
    Lee, Seung Yul
    Lee, Jae W.
    [J]. 2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [25] Memory-Efficient Discrete Wavelet Transform Architecture Based on Wordlength Optimization
    Hu, Yusong
    Jong, Ching Chuen
    [J]. 2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2015, : 1778 - 1781
  • [26] Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes
    Lu, Yuechao
    Ino, Fumihiko
    Hagihara, Kenichi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12): : 3060 - 3071
  • [27] An Efficient GPU-Based Out-of-Core LU Solver of Parallel Higher-Order Method of Moments for Solving Airborne Array Problems
    Lin, Zhongchao
    Chen, Yan
    Zhang, Yu
    Zhao, Xunwang
    Zhang, Huanhuan
    [J]. INTERNATIONAL JOURNAL OF ANTENNAS AND PROPAGATION, 2017, 2017
  • [28] Joint Index, Sorting, and Compression Optimization for Memory-Efficient Spatio-Temporal Data Management
    Richly, Keven
    Schlosser, Rainer
    Boissier, Martin
    [J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1901 - 1906
  • [29] Clustering compression-based computation-efficient calibration method for digital twin modeling of HVAC system
    Lu, Jie
    Tian, Xiangning
    Feng, Chenxin
    Zhang, Chaobo
    Zhao, Yang
    Zhang, Yiwen
    Wang, Zihao
    [J]. BUILDING SIMULATION, 2023, 16 (06) : 997 - 1012
  • [30] Clustering compression-based computation-efficient calibration method for digital twin modeling of HVAC system
    Jie Lu
    Xiangning Tian
    Chenxin Feng
    Chaobo Zhang
    Yang Zhao
    Yiwen Zhang
    Zihao Wang
    [J]. Building Simulation, 2023, 16 : 997 - 1012