A compression-based memory-efficient optimization for out-of-core GPU stencil computation

被引:0
|
作者
Jingcheng Shen
Linbo Long
Xin Deng
Masao Okita
Fumihiko Ino
机构
[1] Chongqing University of Posts and Telecommunications,College of Computer Science and Technology
[2] Osaka University,Graduate School of Information Science and Technology
来源
关键词
On-the-fly compression; Stencil computation; Out-of-core; GPU;
D O I
暂无
中图分类号
学科分类号
摘要
A code for out-of-core stencil computation manages data that exceeds the memory capacity of a GPU. However, such a code necessitates frequent data transfers between the CPU and GPU, which often impede overall performance. In this work, we propose a compression-based, memory-efficient method to accelerate out-of-core stencil codes. First, an on-the-fly compression technique is integrated into the out-of-core computation to reduce CPU-GPU data transfers. Secondly, a single-working-buffer strategy is employed to reduce the GPU memory usage, enabling more data to be stored on the GPU for reuse, resulting in increased temporal blocking steps. Experimental results demonstrate that the proposed method significantly reduces the GPU memory usage by 21%, thereby creating space for doubling the number of temporal blocking steps compared to the codes without compression. Our proposed method has shown to help the high-order, data-transfer-bound stencil codes achieve speedups up to 2.09×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2.09\times $$\end{document} for single-precision floating-point format and up to 1.92×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.92\times $$\end{document} for double-precision floating-point format on an NVIDIA Tesla V100 GPU in comparison with the codes without compression.
引用
收藏
页码:11055 / 11077
页数:22
相关论文
共 42 条
  • [1] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
    Shen, Jingcheng
    Long, Linbo
    Deng, Xin
    Okita, Masao
    Ino, Fumihiko
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11055 - 11077
  • [2] Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression
    Shen, Jingcheng
    Wu, Yifan
    Okita, Masao
    Ino, Fumihiko
    [J]. PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 3 - 14
  • [3] A Data-Centric Directive-Based Framework to Accelerate Out-of-Core Stencil Computation on a GPU
    Shen, Jingcheng
    Ino, Fumihiko
    Farres, Albert
    Hanzich, Mauricio
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (12): : 2421 - 2434
  • [4] A data-centric directive-based framework to accelerate out-of-core stencil computation on a GPU
    Shen, Jingcheng
    Ino, Fumihiko
    Farrés, Albert
    Hanzich, Mauricio
    [J]. IEICE Transactions on Information and Systems, 2020, E103D (12): : 2421 - 2434
  • [5] An Extension of OpenACC Directives for Out-of-Core Stencil Computation with Temporal Blocking
    Miki, Nobuhiro
    Ino, Fumihiko
    Hagihara, Kenichi
    [J]. PROCEEDINGS OF WACCPD 2016: THIRD WORKSHOP ON ACCELERATOR PROGRAMMING USING DIRECTIVES, 2016, : 36 - 45
  • [6] Evaluation of Flash-based Out-of-core Stencil Computation Algorithms for SSD-Equipped Clusters
    Midorikawa, Hiroko
    Tan, Hideyuki
    [J]. 2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 1031 - 1040
  • [7] AN EFFICIENT OUT-OF-CORE VOLUME RENDERING METHOD BASED ON RAY CASTING AND GPU ACCELERATION
    Xue, Jian
    Lue, Ke
    Tian, Jie
    [J]. 2009 IEEE YOUTH CONFERENCE ON INFORMATION, COMPUTING AND TELECOMMUNICATION, PROCEEDINGS, 2009, : 130 - +
  • [8] Efficient Utilization of Memory Hierarchy to Enable the Computation on Bigger Domains for Stencil Computation in CPU-GPU Based Systems
    Jin, Guanghao
    Lin, James
    Endo, Toshio
    [J]. 2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
  • [9] A Parallel Memory Efficient Framework for Out-of-Core Mesh simplification
    Lu Yongquan
    Li Nan
    Gao Pengdong
    Qiu Chu
    Wang Jintao
    Lv Rui
    [J]. HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 666 - 671
  • [10] FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks
    Patel, Shaurya
    Liu, Tongping
    Guan, Hui
    [J]. PROCEEDINGS OF MCHPC 2021: WORKSHOP ON MEMORY CENTRIC HIGH PERFORMANCE COMPUTING, 2021, : 1 - 8