Buffer Filter: A Last-level Cache Management Policy for CPU-GPGPU Heterogeneous System

被引:5
|
作者
Li, Songyuan [2 ]
Meng, Jinglei [2 ]
Yu, Licheng [2 ]
Ma, Jianliang [2 ]
Chen, Tianzhou [2 ]
Wu, Minghui [1 ]
机构
[1] Zhejiang Univ City Coll, Dept Comp Sci & Engn, Hangzhou, Zhejiang, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
关键词
shared last-level cache; multicore; heterogeneous system; HIGH-PERFORMANCE; REPLACEMENT;
D O I
10.1109/HPCC-CSS-ICESS.2015.290
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
There is a growing trend towards heterogeneous systems, which contain CPUs and GPGPUs in a single chip. Managing those various on-chip resources shared between CPUs and GPGPUs, however, is a big issue and the last-level cache (LLC) is one of the most critical resources due to its impact on system performance. Some well-known cache replacement policies like LRU and DRRIP, designed for a CPU, can not be so well qualified for heterogeneous systems because the LLC will be dominated by memory accesses from thousands of threads of GPGPU applications and this may lead to significant performance downgrade for a CPU. Another reason is that a GPGPU is able to tolerate memory latency when quantity of active threads in the GPGPU is sufficient, but those policies do not utilize this feature. In this paper we propose a novel shared LLC management policy for CPU-GPGPU heterogeneous systems called Buffer Filter which takes advantage of memory latency tolerance of GPGPUs. This policy has the ability to restrict streaming requests of GPGPU by adding a buffer to memory system and vacate LLC space for cache-sensitive CPU applications. Although there is some IPC loss for GPGPU but the memory latency tolerance ensures the basic performance of GPGPU's applications. The experiments show that the Buffer Filter is able to filtrate up to 50% to 75% of the total GPGPU streaming requests at the cost of little GPGPU IPC decrease and improve the hit rate of CPU applications by 2x to 7x.
引用
收藏
页码:266 / 271
页数:6
相关论文
共 34 条
  • [11] Deadblock Aware Adaptive Eviction Policy for Shared Last-Level Cache
    Wu, Zhaohui
    Chen, Wei
    Li, Bin
    Liu, Zequan
    Long, Shusheng
    2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 1498 - 1501
  • [12] An Application-Aware Cache Replacement Policy for Last-Level Caches
    Warrier, Tripti S.
    Anupama, B.
    Mutyam, Madhu
    ARCHITECTURE OF COMPUTING SYSTEMS - ARCS 2013, 2013, 7767 : 207 - 219
  • [13] Harvesting Row-Buffer Hits via Orchestrated Last-Level Cache and DRAM Scheduling for Heterogeneous Multicore Systems
    Song, Yang
    Alavoine, Olivier
    Lin, Bill
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2019, 24 (01)
  • [14] Shared Last-level Cache Management for GPGPUs with Hybrid Main Memory
    Wang, Guan
    Cai, Xiaojun
    Ju, Lei
    Zang, Chuanqi
    Zhao, Mengying
    Jia, Zhiping
    PROCEEDINGS OF THE 2017 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2017, : 25 - 30
  • [15] A Last-Level Cache Management for Enhancing Endurance of Phase Change Memory
    Lee, Won Jun
    Kim, Chang Hyun
    Kim, Seon Wook
    2021 36TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC), 2021,
  • [16] CWFP: Novel Collective Writeback and Fill Policy for Last-Level DRAM Cache
    Yin, Shouyi
    Xu, Weizhi
    Li, Jiakun
    Liu, Leibo
    Wei, Shaojun
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2016, 24 (07) : 2548 - 2561
  • [17] OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches
    Wang, Jue
    Dong, Xiangyu
    Xie, Yuan
    DESIGN, AUTOMATION & TEST IN EUROPE, 2013, : 847 - 852
  • [18] SRAM- and STT-RAM-based hybrid, shared last-level cache for on-chip CPU–GPU heterogeneous architectures
    Lan Gao
    Rui Wang
    Yunlong Xu
    Hailong Yang
    Zhongzhi Luan
    Depei Qian
    Han Zhang
    Jihong Cai
    The Journal of Supercomputing, 2018, 74 : 3388 - 3414
  • [19] Bulkyflip: A NAND-SPIN-Based Last-Level Cache With Bandwidth-Oriented Write Management Policy
    Wu, Bi
    Dai, Pengcheng
    Wang, Zhaohao
    Wang, Chao
    Wang, Ying
    Yang, Jianlei
    Cheng, Yuanqing
    Liu, Dijun
    Zhang, Youguang
    Zhao, Weisheng
    Hu, Xiaobo Sharon
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (01) : 108 - 120
  • [20] SRAM- and STT-RAM-based hybrid, shared last-level cache for on-chip CPU-GPU heterogeneous architectures
    Gao, Lan
    Wang, Rui
    Xu, Yunlong
    Yang, Hailong
    Luan, Zhongzhi
    Qian, Depei
    Zhang, Han
    Cai, Jihong
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (07): : 3388 - 3414