Reducing Load Latency with Cache Level Prediction

被引：6

作者：

Jalili, Majid ^{[1
]}

Erez, Mattan ^{[1
]}

机构：

[1] Univ Texas Austin, Austin, TX 78712 USA

来源：

2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022) | 2022年

基金：

美国国家科学基金会;

关键词：

SPECULATION;

D O I：

10.1109/HPCA53966.2022.00054

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is requested by load instructions. However, data prefetching has shown to be imperfect in many situations. We propose cache-level prediction to complement prefetchers. Our method predicts which memory hierarchy level a load will access allowing the memory loads to start earlier, and thereby saves many cycles. The predictor provides high prediction accuracy at the cost of just one cycle added latency to L1 misses. Level prediction reduces the memory access latency by 20% on average, and provides speedup of 10.3% over a conventional baseline, and 6.1% over a boosted baseline on generic, graph, and HPC applications.

引用

页码：648 / 661

页数：14

共 50 条

[1] Hiding data cache latency with load address prediction
Sato, T
Fujii, H
Suzuki, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1996, E79D (11): : 1523 - 1532
[2] Hiding data cache latency with load address prediction
Sato, Toshinori
Fujii, Hiroshige
Suzuki, Seigo
IEICE Transactions on Information and Systems, 1996, E79-D (11) : 1523 - 1532
[3] ATCache: Reducing DRAM Cache Latency via a Small SRAM Tag Cache
Huang, Cheng-Chieh
Nagarajan, Vijay
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 51 - 60
[4] On reducing load/store latencies of cache accesses
Hwang, Yuan-Shin
Li, Jia-Jhe
JOURNAL OF SYSTEMS ARCHITECTURE, 2010, 56 (01) : 1 - 15
[5] Reducing web latency with hierarchical cache-based prefetching
Foygel, D
Strelow, D
2000 INTERNATIONAL WORKSHOPS ON PARALLEL PROCESSING, PROCEEDINGS, 2000, : 103 - 108
[6] Reducing Latency in an SRAM/DRAM Cache Hierarchy via a Novel Tag-Cache Architecture
Hameed, Fazal
Bauer, Lars
Henkel, Joerg
2014 51ST ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2014,
[7] Reducing cache traffic and energy with macro data load
Jin, Lei
Cho, Sangyeun
ISLPED '06: PROCEEDINGS OF THE 2006 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, 2006, : 147 - 150
[8] Codesign of NoC and Cache Organization for Reducing Access Latency in Chip Multiprocessors
Abousamra, Ahmed
Jones, Alex K.
Melhem, Rami
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2012, 23 (06) : 1038 - 1046
[9] A Tabu Based Cache to Improve Latency and Load Balancing on Prefix Trees
Hidalgo, Nicolas
Arantes, Luciana
Sens, Pierre
Bonnaire, Xavier
2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 557 - 564
[10] Reducing network latency and server load in hypermedia systems
BenAhmed, C
Boudriga, N
INFORMATION SCIENCES, 1997, 102 (1-4) : 1 - 29

← 1 2 3 4 5 →