A compression-based memory-efficient optimization for out-of-core GPU stencil computation

被引:1
|
作者
Shen, Jingcheng [1 ]
Long, Linbo [1 ]
Deng, Xin [1 ]
Okita, Masao [2 ]
Ino, Fumihiko [2 ]
机构
[1] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, 2 Chongwen Rd, Chongqing 400065, Peoples R China
[2] Osaka Univ, Grad Sch Informat Sci & Technol, 1-5 Yamadaoka, Suita, Osaka 5650871, Japan
来源
JOURNAL OF SUPERCOMPUTING | 2023年 / 79卷 / 10期
基金
日本学术振兴会; 中国国家自然科学基金;
关键词
On-the-fly compression; Stencil computation; Out-of-core; GPU; LOSSY COMPRESSION; ALGORITHM;
D O I
10.1007/s11227-023-05103-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A code for out-of-core stencil computation manages data that exceeds the memory capacity of a GPU. However, such a code necessitates frequent data transfers between the CPU and GPU, which often impede overall performance. In this work, we propose a compression-based, memory-efficient method to accelerate out-of-core stencil codes. First, an on-the-fly compression technique is integrated into the out of-core computation to reduce CPU-GPU data transfers. Secondly, a single-working buffer strategy is employed to reduce the GPU memory usage, enabling more data to be stored on the GPU for reuse, resulting in increased temporal blocking steps. Experimental results demonstrate that the proposed method significantly reduces the GPU memory usage by 21%, thereby creating space for doubling the number of temporal blocking steps compared to the codes without compression. Our proposed method has shown to help the high-order, data-transfer-bound stencil codes achieve speedups up to 2.09x for single-precision floating-point format and up to 1.92x for double-precision floating-point format on an NVIDIA Tesla V100 GPU in comparison with the codes without compression.
引用
收藏
页码:11055 / 11077
页数:23
相关论文
共 42 条
  • [1] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
    Jingcheng Shen
    Linbo Long
    Xin Deng
    Masao Okita
    Fumihiko Ino
    [J]. The Journal of Supercomputing, 2023, 79 : 11055 - 11077
  • [2] Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression
    Shen, Jingcheng
    Wu, Yifan
    Okita, Masao
    Ino, Fumihiko
    [J]. PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 3 - 14
  • [3] A Data-Centric Directive-Based Framework to Accelerate Out-of-Core Stencil Computation on a GPU
    Shen, Jingcheng
    Ino, Fumihiko
    Farres, Albert
    Hanzich, Mauricio
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (12): : 2421 - 2434
  • [4] A data-centric directive-based framework to accelerate out-of-core stencil computation on a GPU
    Shen, Jingcheng
    Ino, Fumihiko
    Farrés, Albert
    Hanzich, Mauricio
    [J]. IEICE Transactions on Information and Systems, 2020, E103D (12): : 2421 - 2434
  • [5] An Extension of OpenACC Directives for Out-of-Core Stencil Computation with Temporal Blocking
    Miki, Nobuhiro
    Ino, Fumihiko
    Hagihara, Kenichi
    [J]. PROCEEDINGS OF WACCPD 2016: THIRD WORKSHOP ON ACCELERATOR PROGRAMMING USING DIRECTIVES, 2016, : 36 - 45
  • [6] Evaluation of Flash-based Out-of-core Stencil Computation Algorithms for SSD-Equipped Clusters
    Midorikawa, Hiroko
    Tan, Hideyuki
    [J]. 2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 1031 - 1040
  • [7] AN EFFICIENT OUT-OF-CORE VOLUME RENDERING METHOD BASED ON RAY CASTING AND GPU ACCELERATION
    Xue, Jian
    Lue, Ke
    Tian, Jie
    [J]. 2009 IEEE YOUTH CONFERENCE ON INFORMATION, COMPUTING AND TELECOMMUNICATION, PROCEEDINGS, 2009, : 130 - +
  • [8] Efficient Utilization of Memory Hierarchy to Enable the Computation on Bigger Domains for Stencil Computation in CPU-GPU Based Systems
    Jin, Guanghao
    Lin, James
    Endo, Toshio
    [J]. 2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
  • [9] A Parallel Memory Efficient Framework for Out-of-Core Mesh simplification
    Lu Yongquan
    Li Nan
    Gao Pengdong
    Qiu Chu
    Wang Jintao
    Lv Rui
    [J]. HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 666 - 671
  • [10] FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks
    Patel, Shaurya
    Liu, Tongping
    Guan, Hui
    [J]. PROCEEDINGS OF MCHPC 2021: WORKSHOP ON MEMORY CENTRIC HIGH PERFORMANCE COMPUTING, 2021, : 1 - 8