Reshaping Cache Misses to Improve Row-Buffer Locality in Multicore Systems

被引:0
|
作者
Ding, Wei [1 ]
Liu, Jun [1 ]
Kandemir, Mahmut [1 ]
Irwin, Mary Jane [1 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
关键词
Compiler Optimization; Data Locality; Data Transformation; Row Buffer; Memory System; Multicore;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Optimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that determines memory system performance. Many emerging multicores employ banked memory systems and each bank is attached a row-buffer that holds the most-recently accessed memory row (page). A last-level cache miss that also misses in the row-buffer can experience much higher latency than a cache miss that hits in the row-buffer. Consequently, optimizing for row-buffer locality can be as important as optimizing for cache locality. Targeting emerging multicores and multithreaded applications, this paper presents a compiler-directed row-buffer locality optimization strategy. This strategy modifies the memory layout of data to increase the number of row-buffer hits without increasing the number of misses in the on-chip cache hierarchy. We implemented our proposed optimization strategy in an open-source compiler and tested its effectiveness in improving the row-buffer performance using a set of multithreaded applications. Our results indicate that the proposed approach improves the average data access latency by about 29%, and this translates, on average, to about 15% improvement in execution time.
引用
收藏
页码:235 / 244
页数:10
相关论文
共 8 条
  • [1] Decoupling Contention with Victim Row-Buffer on Multicore Memory Systems
    Gao, Ke
    Fan, Dongrui
    Wu, Jie
    Liu, Zhiyong
    [J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 454 - 463
  • [2] Row-Buffer Hit Harvesting in Orchestrated Last-Level Cache and DRAM Scheduling for Heterogeneous Multicore Systems
    Song, Yang
    Alavoine, Olivier
    Lin, Bill
    [J]. PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 779 - 784
  • [3] Harvesting Row-Buffer Hits via Orchestrated Last-Level Cache and DRAM Scheduling for Heterogeneous Multicore Systems
    Song, Yang
    Alavoine, Olivier
    Lin, Bill
    [J]. ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2019, 24 (01)
  • [4] Improving Memory Efficiency in Heterogeneous MPSoCs through Row-Buffer Locality-aware Forwarding
    Song, Yang
    Lin, Bill
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2020, 17 (01)
  • [5] A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality
    Zhang, Z
    Zhu, ZC
    Zhang, XD
    [J]. 33RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE: MICRO-33 2000, PROCEEDINGS, 2000, : 32 - 41
  • [6] Buffer-Controlled Cache for Low-Power Multicore Systems
    Calagos, Marven
    Chu, Yul
    [J]. 2016 IEEE SIXTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), 2016, : 147 - 152
  • [7] Making LRU friendly to weak locality workloads: A novel replacement algorithm to improve buffer cache performance
    Jiang, S
    Zhang, XD
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2005, 54 (08) : 939 - 952
  • [8] Combining Process-based Cache Partitioning and Pollute Region Isolation to Improve Shared Last Level Cache Utilization on Multicore Systems
    Huang, Tao
    Wang, Jing
    Guan, Xuetao
    Zhong, Qi
    Wang, Keyi
    [J]. 2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 1153 - 1160