An Extension of OpenACC Directives for Out-of-Core Stencil Computation with Temporal Blocking

被引:0
|
作者
Miki, Nobuhiro [1 ]
Ino, Fumihiko [1 ]
Hagihara, Kenichi [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, 1-5 Yamadaoka, Suita, Osaka 5650871, Japan
基金
日本学术振兴会; 日本科学技术振兴机构;
关键词
D O I
10.1109/WACCPD.2016.10
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, aiming at realizing directive-based temporal blocking for out-of-core stencil computation, we present an extension of OpenACC directives and a source-to-source translator capable of accelerating out-of-core stencil computation on a graphics processing unit (GPU). Out-of-core stencil computation here deals with large data that cannot be entirely stored in GPU memory. Given an OpenACC-like code, the proposed translator generates an OpenACC code such that it decomposes large data into smaller chunks, which are then processed in a pipelined manner to hide the data transfer overhead needed for exchanging chunks between the GPU memory and CPU memory. Furthermore, the generated code is optimized with a temporal blocking technique to minimize the amount of CPU-GPU data transfer. In experiments, we apply the proposed translator to three stencil computation codes. The out-of-core performance on a Tesla K40 GPU reaches 73.4 GFLOPS, which is only 13% lower than the in-core performance. Therefore, we think that our directive-based approach is useful for facilitating out-of-core stencil computation on a GPU.
引用
收藏
页码:36 / 45
页数:10
相关论文
共 32 条
  • [21] Grid and cluster matrix computation with persistent storage and out-of-core programming
    Aouad, Lamine M.
    Petiton, Serge G.
    Sato, Mitsuhisa
    [J]. 2005 IEEE International Conference on Cluster Computing (CLUSTER), 2006, : 372 - 380
  • [22] Developing a user-level middleware for out-of-core computation on grids
    Tang, JQ
    Fang, BX
    Hu, MZ
    Zhang, HL
    [J]. 2004 IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID - CCGRID 2004, 2004, : 686 - 690
  • [23] A flow-guided file layout for out-of-core streamline computation
    Ohio State University, OH 43210, United States
    [J]. IEEE Symp. Large-Scale Data Anal. Visualization, LDAV 2011 - Proc, 1600, (115-116):
  • [24] The out-of-core KNN awakens: the light side of computation force on large datasets
    Javier Olivares
    Anne-Marie Kermarrec
    Nitin Chiluka
    [J]. Computing, 2019, 101 : 19 - 38
  • [25] The out-of-core KNN awakens: the light side of computation force on large datasets
    Olivares, Javier
    Kermarrec, Anne-Marie
    Chiluka, Nitin
    [J]. COMPUTING, 2019, 101 (01) : 19 - 38
  • [26] Realizing Out-of-Core Stencil Computations using Multi-Tier Memory Hierarchy on GPGPU Clusters
    Endo, Toshio
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 21 - 29
  • [27] The Out-of-core KNN Awakens: The Light Side of Computation Force on Large Datasets
    Chiluka, Nitin
    Kermarrec, Anne-Marie
    Olivares, Javier
    [J]. NETWORKED SYSTEMS, NETYS 2016, 2016, 9944 : 295 - 310
  • [28] Blk-Tune: Blocking Parameter Auto-Tuning to Minimize Input-Output Traffic for Flash-based Out-of-Core Stencil Computations
    Midorikawa, Hiroko
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 1516 - 1526
  • [29] Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL
    Zohouri, Hamid Reza
    Podobas, Artur
    Matsuoka, Satoshi
    [J]. PROCEEDINGS OF THE 2018 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'18), 2018, : 153 - 162
  • [30] A detailed implementation of multithreading and out-of-core computation to the conventional boundary element algorithm with minimum code changes
    Leandro de Souza Schiara
    Amarildo Tabone Paschoalini
    [J]. Journal of the Brazilian Society of Mechanical Sciences and Engineering, 2023, 45