Decoupled Fused Cache: Fusing a Decoupled LLC with a DRAM Cache

被引:10
|
作者
Vasilakis, Evangelos [1 ]
Papaefstathiou, Vassilis [2 ]
Trancoso, Pedro [1 ]
Sourdis, Ioannis [1 ]
机构
[1] Chalmer Univ Technol, CSE Dept, Rannvagen 6, Gothenburg, Sweden
[2] Fdn Res & Technol Hellas FORTH, 100 Nikolaou Plastira Str, Iraklion, Greece
基金
欧洲研究理事会; 欧盟地平线“2020”;
关键词
Caches; 3D stacking; DRAM; processor; memory;
D O I
10.1145/3293447
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
DRAM caches have shown excellent potential in capturing the spatial and temporal data locality of applications capitalizing on advances of 3D-stacking technology; however, they are still far from their ideal performance. Besides the unavoidable DRAM access to fetch the requested data, tag access is in the critical path, adding significant latency and energy costs. Existing approaches are not able to remove these overheads and in some cases limit DRAM cache design options. For instance, caching DRAM cache tags adds constant latency to every access; accessing the DRAM cache using the 'I'1,B calls for OS support and DRAM cachelines as large as a page; reusing the last-level cache (LLC) tags to access the DRAM cache limits LLC performance as it requires indexing the LLC using higher-order address bits. In this article, we introduce Decoupled Fused Cache, a DRAM cache design that alleviates the cost of tag accesses by fusing DRAM cache tags with the tags of the on-chip LLC without affecting LLC performance. In essence, the Decoupled Fused Cache relies in most cases on the LLC tag access to retrieve the required information for accessing the DRAM cache while avoiding additional overheads. Compared to current DRAM cache designs of the same cacheline size, Decoupled Fused Cache improves system performance by 6% on average and by 16% to 18% for large cacheline sizes. Finally, Decoupled Fused Cache reduces DRAM cache traffic by 18% and DRAM cache energy consumption by 7%.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] Morphable DRAM Cache Design for Hybrid Memory Systems
    Cha, Sanghoon
    Kim, Bokyeong
    Park, Chang Hyun
    Huh, Jaehyuk
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 16 (03)
  • [42] Contextual Multi-Armed Bandit for Cache-Aware Decoupled Multiple Association in UDNs: A Deep Learning Approach
    Dai, Chen
    Zhu, Kun
    Wang, Ran
    Chen, Bing
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2019, 5 (04) : 1046 - 1059
  • [43] Cache Management for Video Servers by the Combined Use of DRAM and SSD
    Lee, Jungwoo
    Song, Minseok
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 537 - 540
  • [44] A Buffer Cache Architecture for Smartphones with Hybrid DRAM/PCM Memory
    Lin, Ye-Jyun
    Yang, Chia-Lin
    Li, Hsiang-Pang
    Wang, Cheng-Yuan Michael
    2015 IEEE NON-VOLATILE MEMORY SYSTEMS AND APPLICATIONS SYMPOSIUM (NVMSA), 2015,
  • [45] ACAM: Application Aware Adaptive Cache Management for Shared LLC
    Mahto, Sujit Kr
    Newton
    VLSI DESIGN AND TEST, 2017, 711 : 324 - 336
  • [46] Q-DRAM: Quick-Access DRAM with Decoupled Restoring from Row-Activation
    Shin, Wongyu
    Choi, Jungwhan
    Jang, Jaemin
    Suh, Jinwoong
    Kwon, Yongkee
    Moon, Youngsuk
    Kim, Hongsik
    Kim, Lee-Sup
    IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (07) : 2213 - 2227
  • [47] Selective DRAM cache bypassing for improving bandwidth on DRAM/NVM hybrid main memory systems
    Ro, Yuhwan
    Sung, Minchul
    Park, Yongjun
    Ahn, Jung Ho
    IEICE ELECTRONICS EXPRESS, 2017, 14 (11):
  • [48] A Hybrid DRAM/PCM Buffer Cache Architecture for Smartphones with QoS Consideration
    Lin, Ye-Jyun
    Yang, Chia-Lin
    Li, Hsiang-Pang
    Wang, Cheng-Yuan Michael
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2017, 22 (02)
  • [49] Morpheus: An Adaptive DRAM Cache with Online Granularity Adjustment for Disaggregated Memory
    Zhang, Xu
    Lu, Tianyue
    Chang, Yisong
    Zhang, Ke
    Chen, Mingyu
    2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 134 - 141
  • [50] A 32-BANK 256-MB DRAM WITH CACHE AND TAG
    TANOI, S
    TANAKA, Y
    TANABE, T
    KITA, A
    INADA, T
    HAMAZAKI, R
    OHTSUKI, Y
    UESUGI, M
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 1994, 29 (11) : 1330 - 1335