Reducing Load Latency with Cache Level Prediction

被引:6
|
作者
Jalili, Majid [1 ]
Erez, Mattan [1 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
SPECULATION;
D O I
10.1109/HPCA53966.2022.00054
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is requested by load instructions. However, data prefetching has shown to be imperfect in many situations. We propose cache-level prediction to complement prefetchers. Our method predicts which memory hierarchy level a load will access allowing the memory loads to start earlier, and thereby saves many cycles. The predictor provides high prediction accuracy at the cost of just one cycle added latency to L1 misses. Level prediction reduces the memory access latency by 20% on average, and provides speedup of 10.3% over a conventional baseline, and 6.1% over a boosted baseline on generic, graph, and HPC applications.
引用
收藏
页码:648 / 661
页数:14
相关论文
共 50 条
  • [1] Hiding data cache latency with load address prediction
    Sato, T
    Fujii, H
    Suzuki, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1996, E79D (11): : 1523 - 1532
  • [2] Hiding data cache latency with load address prediction
    Sato, Toshinori
    Fujii, Hiroshige
    Suzuki, Seigo
    IEICE Transactions on Information and Systems, 1996, E79-D (11) : 1523 - 1532
  • [3] ATCache: Reducing DRAM Cache Latency via a Small SRAM Tag Cache
    Huang, Cheng-Chieh
    Nagarajan, Vijay
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 51 - 60
  • [4] On reducing load/store latencies of cache accesses
    Hwang, Yuan-Shin
    Li, Jia-Jhe
    JOURNAL OF SYSTEMS ARCHITECTURE, 2010, 56 (01) : 1 - 15
  • [5] Reducing web latency with hierarchical cache-based prefetching
    Foygel, D
    Strelow, D
    2000 INTERNATIONAL WORKSHOPS ON PARALLEL PROCESSING, PROCEEDINGS, 2000, : 103 - 108
  • [6] Reducing Latency in an SRAM/DRAM Cache Hierarchy via a Novel Tag-Cache Architecture
    Hameed, Fazal
    Bauer, Lars
    Henkel, Joerg
    2014 51ST ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2014,
  • [7] Reducing cache traffic and energy with macro data load
    Jin, Lei
    Cho, Sangyeun
    ISLPED '06: PROCEEDINGS OF THE 2006 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, 2006, : 147 - 150
  • [8] Codesign of NoC and Cache Organization for Reducing Access Latency in Chip Multiprocessors
    Abousamra, Ahmed
    Jones, Alex K.
    Melhem, Rami
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2012, 23 (06) : 1038 - 1046
  • [9] A Tabu Based Cache to Improve Latency and Load Balancing on Prefix Trees
    Hidalgo, Nicolas
    Arantes, Luciana
    Sens, Pierre
    Bonnaire, Xavier
    2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 557 - 564
  • [10] Reducing network latency and server load in hypermedia systems
    BenAhmed, C
    Boudriga, N
    INFORMATION SCIENCES, 1997, 102 (1-4) : 1 - 29