Improving cache locality with blocked array layouts

被引:1
|
作者
Athanasaki, E [1 ]
Koziris, N [1 ]
机构
[1] Natl Tech Univ Athens, Sch Elect & Comp Engn, Comp Syst Lab, GR-15773 Zografos, Greece
关键词
D O I
10.1109/EMPDP.2004.1271460
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Minimizing cache misses is one of the most important factors to reduce average latency for memory accesses. Tiled codes modify the instruction stream to exploit cache locality for array accesses. In this paper we further reduce cache misses, restructuring the memory layout of multidimensional arrays, that are accessed by tiled instruction code. In our method, array elements are stored in a blocked way, exactly as they are swept by the tiled instruction stream. We present a straightforward way to easily translate multidimensional indexing of arrays into their blocked memory layout using simple binary-mask operations. Indices for such array layouts are easily calculated based on the algebra of dilated integers, similarly to morton-order indexing. Actual experimental results, using matrix multiplication and LU-decomposition on various size arrays, illustrate that execution time is greatly improved when combining tiled code with tiled array layouts and binary mask-based index translation functions. Simulations using the Simplescalar tool, verify that enhanced performance is due to the considerable reduction of total cache misses.
引用
收藏
页码:308 / 317
页数:10
相关论文
共 50 条
  • [1] Fast indexing for blocked array layouts to improve multi-level cache locality
    Athanasaki, E
    Koziris, N
    EIGHTH WORKSHOP ON INTERACTION BETWEEN COMPILERS AND COMPUTER ARCHITECTURES, PROCEEDINGS, 2004, : 109 - 119
  • [2] IMPROVING THE CACHE LOCALITY OF MEMORY ALLOCATION
    GRUNWALD, D
    ZORN, B
    HENDERSON, R
    SIGPLAN NOTICES, 1993, 28 (06): : 177 - 186
  • [3] A tile size selection analysis for blocked array layouts
    Athanasaki, E
    Koziris, N
    Tsanakas, P
    9TH ANNUAL WORKSHOP ON INTERACTION BETWEEN COMPILERS AND COMPUTER ARCHITECTURES, PROCEEDINGS, 2005, : 70 - 80
  • [4] Tuning blocked array layouts to exploit memory hierarchy in SMT architectures
    Athanasaki, E
    Kourtis, K
    Anastopoulos, N
    Koziris, N
    ADVANCES IN INFORMATICS, PROCEEDINGS, 2005, 3746 : 600 - 610
  • [5] Improving cache locality by a combination of loop and data transformations
    Kandemir, M
    Ramanujam, J
    Choudhary, A
    IEEE TRANSACTIONS ON COMPUTERS, 1999, 48 (02) : 159 - 167
  • [6] Improving Test Execution Time with Improved Cache Locality
    Stratis, Panagiotis
    PROCEEDINGS OF THE 2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C 2017), 2017, : 82 - 84
  • [7] Reshape your layouts, not your programs: A safe language extension for better cache locality
    Tasos, Alexandros
    Franco, Juliana
    Drossopoulou, Sophia
    Wrigstad, Tobias
    Eisenbach, Susan
    SCIENCE OF COMPUTER PROGRAMMING, 2020, 197
  • [8] Improving data locality by array contraction
    Song, YH
    Xu, R
    Wang, C
    Li, ZY
    IEEE TRANSACTIONS ON COMPUTERS, 2004, 53 (09) : 1073 - 1084
  • [9] Improving cache locality for GPU-based volume rendering
    Sugimoto, Yuki
    Ino, Fumihiko
    Hagihara, Kenichi
    PARALLEL COMPUTING, 2014, 40 (5-6) : 59 - 69
  • [10] A parametrized loop fusion algorithm for improving parallelism and cache locality
    Singhai, SK
    McKinley, KS
    COMPUTER JOURNAL, 1997, 40 (06): : 340 - 355