Locality-aware data replication in the last-level cache for large scale multicores

被引：4

作者：

Hijaz, Farrukh ^{[1
]}

Shi, Qingchuan ^{[1
]}

Kurian, George ^{[2
,3
]}

Devadas, Srinivas ^{[2
]}

Khan, Omer ^{[1
]}

机构：

[1] Univ Connecticut, Storrs, CT USA

[2] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA

[3] Google, Mountain View, CA USA

来源：

JOURNAL OF SUPERCOMPUTING | 2016年 / 72卷 / 02期

基金：

美国国家科学基金会;

关键词：

Multicore; Cache hierarchy; Data management; Energy efficiency; CAPACITY ALLOCATION; CHIP; HIERARCHY; PLACEMENT; COHERENCE;

D O I：

10.1007/s11227-015-1608-4

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Next generation large single-chip multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of on-chip cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LLC). The goal is to lower memory access latency and energy by only replicating cache lines with high reuse in the LLC slice of the requesting core, while simultaneously keep the off-chip miss rate low. The approach relies on low-overhead yet highly accurate in hardware runtime cache line level classifier that only allows replication of cache lines with high reuse. Furthermore, a classifier captures the LLC pressure at the existing replica locations and adapts its replication decision accordingly. On a set of parallel benchmarks, the proposed protocol reduces overall energy by 14.7, 10.7, 10.5, and 16.7 % and completion time by 2.5, 6.5, 4.5, and 9.5 % when compared to the previously proposed Victim Replication, Adaptive Selective Replication, Reactive-NUCA, and Static-NUCA LLC management schemes. An efficient classifier implementation is evaluated with an overhead of 5.44 KB, which translates to only 1.58 % on top of the Static-NUCA baseline's cache related per-core storage.

引用

页码：718 / 752

页数：35

共 50 条

[1] Locality-aware data replication in the last-level cache for large scale multicores
Farrukh Hijaz
Qingchuan Shi
George Kurian
Srinivas Devadas
Omer Khan
The Journal of Supercomputing, 2016, 72 : 718 - 752
[2] Locality-Aware Data Replication in the Last-Level Cache
Kurian, George
Devadas, Srinivas
Khan, Omer
2014 20TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA-20), 2014, : 1 - 12
[3] Reuse locality aware cache partitioning for last-level cache
Shen, Fanfan
He, Yanxiang
Zhang, Jun
Li, Qingan
Li, Jianhua
Xu, Chao
COMPUTERS & ELECTRICAL ENGINEERING, 2019, 74 : 319 - 330
[4] Locality-Aware Mapping and Scheduling for Multicores
Ding, Wei
Zhang, Yuanrui
Kandemir, Mahmut
Srinivas, Jithendra
Yedlapalli, Praveen
PROCEEDINGS OF THE 2013 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2013, : 335 - 346
[5] LDAC: Locality-Aware Data Access Control for Large-Scale Multicore Cache Hierarchies
Shi, Qingchuan
Kurian, George
Hijaz, Farrukh
Devadas, Srinivas
Khan, Omer
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (04)
[6] LA-LLC: Inter-Core Locality-Aware Last-Level Cache to Exploit Many-to-Many Traffic in GPGPUs
Zhao, Xia
Liu, Yuxi
Adileh, Almutaz
Eeckhout, Lieven
IEEE COMPUTER ARCHITECTURE LETTERS, 2017, 16 (01) : 42 - 45
[7] Software-Hardware Managed Last-level Cache Allocation Scheme for Large-Scale NVRAM-based Multicores Executing Parallel Data Analytics Applications
Ahmad, Masab
Dogan, Halit
Checconi, Fabio
Que, Xinyu
Buono, Daniele
Khan, Omer
2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 316 - 325
[8] Last-level Cache Deduplication
Tian, Yingying
Khan, Samira M.
Jimenez, Daniel A.
Loh, Gabriel H.
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, : 53 - 62
[9] Locality-aware cache random replacement policies
Benedicte, Pedro
Hernandez, Carles
Abella, Jaume
Cazorla, Francisco J.
JOURNAL OF SYSTEMS ARCHITECTURE, 2019, 93 : 48 - 61
[10] A Reuse-Degree Based Locality Classifier for Locality-Aware Data Replication
Wu, Qianqian
Ji, Zhenzhou
IEEE ACCESS, 2019, 7 : 182207 - 182216

← 1 2 3 4 5 →