High Performance and Predictable Shared Last-level Cache for Safety-Critical Systems

被引:0
|
作者
Wu, Zhuanhao [1 ]
Kaushik, Anirudh [2 ]
Patel, Hiren [3 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
[2] Intel Corp, Toronto, ON, Canada
[3] Univ Waterloo, Elect & Comp Engn, Waterloo, ON, Canada
关键词
Last-level cache; inclusive cache; safety-critical systems; worst-case latency analysis; back invalidation;
D O I
10.1145/3687308
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose ZeroCost-LLC (ZCLLC), a novel shared inclusive last-level cache (LLC) design for timing predictable multi-core platforms that offers lower worst-case latency (WCL) when compared with a traditional shared inclusive LLC design. ZCLLC achieves low WCL by eliminating certain memory operations in the form of cache line invalidations across the cache hierarchy that are a consequence of a core's memory request that misses in the cache hierarchy and when there is no vacant entry in the LLC to accommodate the fetched data for this request. In addition to low WCL, ZCLLC offers performance benefits in the form of additional caching capacity and unlike state-of-the-art approaches, ZCLLC does not impose any constraints on its usage across multiple cores. In this work, we describe the impact of LLC cache line invalidations on the WCL and systematically build solutions to eliminate these invalidations resulting in ZCLLC. We also present ZCLLC-OPT, an optimized variant of ZCLLC that offers lower WCL and improved average-case performance over ZCLLC. We apply optimizations to the shared bus arbitration mechanism and extend the micro-architecture of ZCLLC to allow for overlapping memory requests to the main memory. Our analysis reveals that the analytical WCL of a memory request under ZCLLC-OPT is 87.0%, 93.8%, and 97.1% lower than that under state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. ZCLLC-OPT shows average-case performance speedups of 1.89x, 3.36x, and 6.24x compared with the state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. When compared with the original ZCLLC that does not have any optimizations, ZCLLC-OPT shows lower analytical WCLs that are 76.5%, 82.6%, and 86.2% lower compared with ZCLLC-NORMAL for 2, 4, and 8 cores, respectively.
引用
收藏
页数:30
相关论文
共 50 条
  • [31] SCIP: Selective Cache Insertion and Bypassing to Improve the Performance of Last-Level Caches
    Kharbutli, Mazen
    Jarrah, Moath
    Jararweh, Yaser
    2013 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT), 2013,
  • [32] A High-Resolution Side-Channel Attack on Last-Level Cache
    Kayaalp, Mehmet
    Abu-Ghazaleh, Nael
    Ponomarev, Dmitry
    Jaleel, Aamer
    2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2016,
  • [33] Architecture Level Safety Analyses for Safety-Critical Systems
    Kushal, K. S.
    Nanda, Manju
    Jayanthi, J.
    INTERNATIONAL JOURNAL OF AEROSPACE ENGINEERING, 2017, 2017
  • [34] Performance and Energy-Efficient Design of STT-RAM Last-Level Cache
    Hameed, Fazal
    Khan, Asif Ali
    Castrillon, Jeronimo
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (06) : 1059 - 1072
  • [35] Rowhammer Cache: A Last-level Cache for Low-Overhead Rowhammer Tracking
    Singh, Aman
    Panda, Biswabandan
    2024 IEEE INTERNATIONAL SYMPOSIUM ON HARDWARE ORIENTED SECURITY AND TRUST, HOST, 2024, : 349 - 360
  • [36] A Predictable Servant-based Execution Model for Safety-critical Systems
    Wan, Bo
    Li, Xi
    Hang, Bo
    Zhou, Kaiqi
    Luo, Haizhao
    Wang, Chao
    Chen, Xianglan
    Zhou, Xuehai
    2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 892 - 896
  • [37] Prefetching Techniques for STT-RAM based Last-level Cache in CMP Systems
    Mao, Mengjie
    Sun, Guangyu
    Li, Yong
    Jones, Alex K.
    Chen, Yiran
    2014 19TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2014, : 67 - 72
  • [38] Optimal Bypass Monitor for High Performance Last-level Caches
    Li, Lingda
    Tong, Dong
    Xie, Zichao
    Lu, Junlin
    Cheng, Xu
    PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'12), 2012, : 315 - 324
  • [39] Architecting HBM as a High Bandwidth, High Capacity, Self-Managed Last-Level Cache
    Stocksdale, Tyler
    Chang, Mu-Tien
    Zheng, Hongzhong
    Mueller, Frank
    PROCEEDINGS OF PDSW-DISCS 2017: 2ND JOINT INTERNATIONAL WORKSHOP ON PARALLEL DATA STORAGE & DATA INTENSIVE SCALABLE COMPUTING SYSTEMS, 2017, : 31 - 36
  • [40] Locality-Aware Data Replication in the Last-Level Cache
    Kurian, George
    Devadas, Srinivas
    Khan, Omer
    2014 20TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA-20), 2014, : 1 - 12