Reshaping Cache Misses to Improve Row-Buffer Locality in Multicore Systems

被引：0

作者：

Ding, Wei ^{[1
]}

Liu, Jun ^{[1
]}

Kandemir, Mahmut ^{[1
]}

Irwin, Mary Jane ^{[1
]}

机构：

[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA

来源：

2013 22ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT) | 2013年

关键词：

Compiler Optimization; Data Locality; Data Transformation; Row Buffer; Memory System; Multicore;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Optimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that determines memory system performance. Many emerging multicores employ banked memory systems and each bank is attached a row-buffer that holds the most-recently accessed memory row (page). A last-level cache miss that also misses in the row-buffer can experience much higher latency than a cache miss that hits in the row-buffer. Consequently, optimizing for row-buffer locality can be as important as optimizing for cache locality. Targeting emerging multicores and multithreaded applications, this paper presents a compiler-directed row-buffer locality optimization strategy. This strategy modifies the memory layout of data to increase the number of row-buffer hits without increasing the number of misses in the on-chip cache hierarchy. We implemented our proposed optimization strategy in an open-source compiler and tested its effectiveness in improving the row-buffer performance using a set of multithreaded applications. Our results indicate that the proposed approach improves the average data access latency by about 29%, and this translates, on average, to about 15% improvement in execution time.

引用

页码：235 / 244

页数：10

共 8 条

[1] Decoupling Contention with Victim Row-Buffer on Multicore Memory Systems
Gao, Ke
Fan, Dongrui
Wu, Jie
Liu, Zhiyong
[J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 454 - 463
[2] Row-Buffer Hit Harvesting in Orchestrated Last-Level Cache and DRAM Scheduling for Heterogeneous Multicore Systems
Song, Yang
Alavoine, Olivier
Lin, Bill
[J]. PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 779 - 784
[3] Harvesting Row-Buffer Hits via Orchestrated Last-Level Cache and DRAM Scheduling for Heterogeneous Multicore Systems
Song, Yang
Alavoine, Olivier
Lin, Bill
[J]. ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2019, 24 (01)
[4] Improving Memory Efficiency in Heterogeneous MPSoCs through Row-Buffer Locality-aware Forwarding
Song, Yang
Lin, Bill
[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2020, 17 (01)
[5] A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality
Zhang, Z
Zhu, ZC
Zhang, XD
[J]. 33RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE: MICRO-33 2000, PROCEEDINGS, 2000, : 32 - 41
[6] Buffer-Controlled Cache for Low-Power Multicore Systems
Calagos, Marven
Chu, Yul
[J]. 2016 IEEE SIXTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), 2016, : 147 - 152
[7] Making LRU friendly to weak locality workloads: A novel replacement algorithm to improve buffer cache performance
Jiang, S
Zhang, XD
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2005, 54 (08) : 939 - 952
[8] Combining Process-based Cache Partitioning and Pollute Region Isolation to Improve Shared Last Level Cache Utilization on Multicore Systems
Huang, Tao
Wang, Jing
Guan, Xuetao
Zhong, Qi
Wang, Keyi
[J]. 2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 1153 - 1160

← 1 →