Optimal Temporal Blocking for Stencil Computation

被引:16
|
作者
Muranushi, Takayuki [1 ]
Makino, Junichiro [1 ]
机构
[1] RIKEN AICS, Chuo Ku, 7-1-26 Minatojima Minami Machi, Kobe, Hyogo 6500047, Japan
关键词
Parallel computation; Stencil computation; Optimization; LOCALITY;
D O I
10.1016/j.procs.2015.05.315
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Temporal blocking is a class of algorithms which reduces the required memory bandwidth (B/F ratio) of a given stencil computation, by "blocking" multiple time steps. In this paper, we prove that a lower limit exists for the reduction of the B/F attainable by temporal blocking, under certain conditions. We introduce the PiTCH tiling, an example of temporal blocking method that achieves the optimal B/F ratio. We estimate the performance of PiTCH tiling for various stencil applications on several modern CPUs. We show that PiTCH tiling achieves 1.5 similar to 2 times better B/F reduction in three-dimensional applications, compared to other temporal blocking schemes. We also show that PiTCH tiling can remove the bandwidth bottleneck from most of the stencil applications considered.
引用
收藏
页码:1303 / 1312
页数:10
相关论文
共 50 条
  • [1] Efficient Stencil Computation with Temporal Blocking by Halide DSL
    Aikawa, Hiroki
    Endo, Toshio
    Yuki, Tomoya
    Hirofuchi, Takahiro
    Ikegami, Tsutomu
    [J]. 2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 870 - 877
  • [2] An Extension of OpenACC Directives for Out-of-Core Stencil Computation with Temporal Blocking
    Miki, Nobuhiro
    Ino, Fumihiko
    Hagihara, Kenichi
    [J]. PROCEEDINGS OF WACCPD 2016: THIRD WORKSHOP ON ACCELERATOR PROGRAMMING USING DIRECTIVES, 2016, : 36 - 45
  • [3] Revisiting Temporal Blocking Stencil Optimizations
    Zhang, Lingqi
    Wahib, Mohamed
    Chen, Peng
    Meng, Jintao
    Wang, Xiao
    Endo, Toshio
    Matsuoka, Satoshi
    [J]. PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023, 2023, : 251 - 263
  • [4] Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL
    Zohouri, Hamid Reza
    Podobas, Artur
    Matsuoka, Satoshi
    [J]. PROCEEDINGS OF THE 2018 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'18), 2018, : 153 - 162
  • [5] Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memory Hierarchy
    Endo, Toshio
    [J]. 2018 7TH IEEE NON-VOLATILE MEMORY SYSTEMS AND APPLICATIONS SYMPOSIUM (NVMSA 2018), 2018, : 19 - 24
  • [6] Accelerating Stencil Computations on a GPU by Combining Using Tensor Cores and Temporal Blocking
    Kambe, Futa
    Endo, Toshio
    [J]. 16TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPU, GPGPU 2024, 2024, : 1 - 6
  • [7] Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
    Wellein, Gerhard
    Hager, Georg
    Zeiser, Thomas
    Wittmann, Markus
    Fehske, Holger
    [J]. 2009 IEEE 33RD INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 579 - +
  • [8] LEVERAGING SHARED CACHES FOR PARALLEL TEMPORAL BLOCKING OF STENCIL CODES ON MULTICORE PROCESSORS AND CLUSTERS
    Wittmann, Markus
    Hager, Georg
    Treibig, Jan
    Wellein, Gerhard
    [J]. PARALLEL PROCESSING LETTERS, 2010, 20 (04) : 359 - 376
  • [9] On the Transformation Optimization for Stencil Computation
    Su, Huayou
    Zhang, Kaifang
    Mei, Songzhu
    [J]. ELECTRONICS, 2022, 11 (01)
  • [10] Locality of Computation for Stencil Optimization
    Yuan, Lufeng
    Liu, Junhong
    Luo, Yulong
    Tan, Guangming
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2016, 2016, 10048 : 449 - 456