The memory behavior of cache oblivious stencil computations

被引:0
|
作者
Matteo Frigo
Volker Strumpen
机构
[1] IBM Austin Research Laboratory,
来源
关键词
Cache oblivious algorithms; Stencil computations; Analysis of algorithms; Performance analysis; System simulation;
D O I
暂无
中图分类号
学科分类号
摘要
We present and evaluate a cache oblivious algorithm for stencil computations, which arise for example in finite-difference methods. Our algorithm applies to arbitrary stencils in n-dimensional spaces. On an “ideal cache” of size Z, our algorithm saves a factor of Θ(Z1/n) cache misses compared to a naive algorithm, and it exploits temporal locality optimally throughout the entire memory hierarchy. We evaluate our algorithm in terms of the number of cache misses, and demonstrate that the memory behavior agrees with our theoretical predictions. Our experimental evaluation is based on a finite-difference solution of a heat diffusion problem, as well as a Gauss-Seidel iteration and a 2-dimensional LBMHD program, both reformulated as cache oblivious stencil computations.
引用
收藏
页码:93 / 112
页数:19
相关论文
共 50 条
  • [1] The memory behavior of cache oblivious stencil computations
    Frigo, Matteo
    Strumpen, Volker
    [J]. JOURNAL OF SUPERCOMPUTING, 2007, 39 (02): : 93 - 112
  • [2] Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model
    Stengel, Holger
    Treibig, Jan
    Hager, Georg
    Wellein, Gerhard
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 207 - 216
  • [3] Casper: Accelerating Stencil Computations Using Near-Cache Processing
    Denzler, Alain
    Oliveira, Geraldo F.
    Hajinazar, Nastaran
    Bera, Rahul
    Singh, Gagandeep
    Gomez-Luna, Juan
    Mutlu, Onur
    [J]. IEEE ACCESS, 2023, 11 : 22136 - 22154
  • [4] Cache-Oblivious Algorithms and Matrix Formats for Computations on Interval Matrices
    Dabrowski, Rafal
    Kubica, Bartlomiej Jacek
    [J]. APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT II, 2012, 7134 : 269 - 279
  • [5] PIMS: A Lightweight Processing-in-Memory Accelerator for Stencil Computations
    Li, Jie
    Wang, Xi
    Tumeo, Antonino
    Williams, Brody
    Leidel, John D.
    Chen, Yong
    [J]. MEMSYS 2019: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2019, : 41 - 52
  • [6] A Distributed Memory Based Embedded CGRA for Accelerating Stencil Computations
    Takeuchi, Shohei
    Yuttakonkit, Yuttakon
    Takamaeda-Yamazaki, Shinya
    Nakashima, Yasuhiko
    [J]. PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, : 385 - 391
  • [7] Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations
    Malas, Tareq M.
    Hager, Georg
    Ltaief, Hatem
    Keyes, David E.
    [J]. ACM TRANSACTIONS ON PARALLEL COMPUTING, 2018, 4 (03)
  • [8] Cache oblivious algorithms
    Kumar, P
    [J]. ALGORITHMS FOR MEMORY HIERARCHIES: ADVANCED LECTURES, 2003, 2625 : 193 - 212
  • [9] Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memory Hierarchy
    Endo, Toshio
    [J]. 2018 7TH IEEE NON-VOLATILE MEMORY SYSTEMS AND APPLICATIONS SYMPOSIUM (NVMSA 2018), 2018, : 19 - 24
  • [10] Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs
    Nasciutti, Thiago Carrijo
    Panetta, Jairo
    Lopes, Pedro Pais
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (18):