BP-NUCA: CACHE PRESSURE-AWARE MIGRATION FOR HIGH-PERFORMANCE CACHING IN CMPS

被引:0
|
作者
Jia, Xiaomin [1 ]
Jiang, Jiang [1 ]
Wang, Yongwen [1 ]
Qi, Shubo [1 ]
Zhao, Tianlei [1 ]
Fu, Guitao [1 ]
Zhang, Minxuan [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Dept Microelect, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Chip multi-processors (CMPs); last-level cache (LLC); block migration; non-uniform cache architecture (NUCA);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the momentum behind Chip Multi-Processors (CMPs) continues to grow, Last Level Cache (LLC) management becomes a crucial issue to CMPs because off-chip accesses often involve a big latency. Private cache design is distinguished by smaller local access latency, good performance isolation and easy scalability, thus is becoming an attractive design alternative for LLC of CMPs. This paper proposes Balanced Private Non-Uniform Cache Architecture (BP-NUCA), a new LLC architecture that starts from private cache design for smaller local access latency and good performance isolation, then introduces a low cost mechanism to dynamically migrate private blocks among peer private caches of LLC to improve the overall space utilization. BP-NUCA achieves this by measuring the cache access pressure level that each cache set experiences at runtime and then using the information to guide block migration among different private caches of LLC. A heavily accessed set, namely a set with high access pressure level, is allowed to migrate its evicted blocks to peer private caches, replacing blocks of sets which are with the same index and have low access pressure level. By migrating blocks from heavily accessed cache sets to less accessed cache sets, BP-NUCA effectively balances space utilization of LLC among different cores. Experimental results using a full system CMP simulator show that BP-NUCA improves the overall throughput by as much as 20.3%, 12.4%, 14.5% and 18.0% (on average 7.7%, 4.4%, 4.0% and 6.1 %) over private cache, shared cache, shared cache management scheme UCP and private cache organization CC respectively on a 4-core CMP for SPEC CPU2006 benchmarks.
引用
收藏
页码:1037 / 1060
页数:24
相关论文
共 14 条
  • [1] Cache pressure-aware caching scheme for content-centric networking
    Luo, Xi
    An, Ying
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (02) : 795 - 806
  • [2] Adaptive Spill-Receive for Robust High-Performance Caching in CMPs
    Qureshi, Moinuddin K.
    [J]. HPCA-15 2009: FIFTEENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2009, : 45 - 54
  • [3] PACMan: Prefetch-Aware Cache Management for High Performance Caching
    Wu, Carole-Jean
    Jaleel, Aamer
    Martonosi, Margaret
    Steely, Simon C., Jr.
    Emer, Joel
    [J]. PROCEEDINGS OF THE 2011 44TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 44), 2011, : 442 - 453
  • [4] Cooperative Partitioning: Energy-Efficient Cache Partitioning for High-Performance CMPs
    Sundararajan, Karthik T.
    Porpodas, Vasileios
    Jones, Timothy M.
    Topham, Nigel P.
    Franke, Bjoern
    [J]. 2012 IEEE 18TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2012, : 311 - 322
  • [5] Flash-Aware High-Performance and Endurable Cache
    Xia, Qianbin
    Xiao, Weijun
    [J]. 2015 IEEE 23RD INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2015), 2015, : 47 - 50
  • [6] A Reusability-Aware Cache Memory Sharing Technique for High-Performance Low-Power CMPs with Private L2 Caches
    Youn, Sungjune
    Kim, Hyunhee
    Kim, Jihong
    [J]. ISLPED'07: PROCEEDINGS OF THE 2007 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, 2007, : 56 - 61
  • [7] LP-NUCA: Networks-in-Cache for High-Performance Low-Power Embedded Processors
    Suarez Gracia, Dario
    Dimitrakopoulos, Giorgos
    Monreal Arnal, Teresa
    Katevenis, Manolis G. H.
    Vinals Yufera, Victor
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2012, 20 (08) : 1510 - 1523
  • [8] High-Performance and Endurable Cache Management for Flash-Based Read Caching
    Xia, Qianbin
    Xiao, Weijun
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (12) : 3518 - 3531
  • [9] GL-Cache: Group-level learning for efficient and high-performance caching
    Yang, Juncheng
    Mao, Ziming
    Yue, Yao
    Rashmi, K. V.
    [J]. PROCEEDINGS OF THE 21ST USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, FAST 2023, 2023, : 115 - 133
  • [10] LAC: A Workload Intensity-Aware Caching Scheme for High-Performance SSDs
    Sun, Hui
    Tong, Haoqiang
    Yue, Yinliang
    Qin, Xiao
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (07) : 1738 - 1752