Buffer Filter: A Last-level Cache Management Policy for CPU-GPGPU Heterogeneous System

被引：5

作者：

Li, Songyuan ^{[2
]}

Meng, Jinglei ^{[2
]}

Yu, Licheng ^{[2
]}

Ma, Jianliang ^{[2
]}

Chen, Tianzhou ^{[2
]}

Wu, Minghui ^{[1
]}

机构：

[1] Zhejiang Univ City Coll, Dept Comp Sci & Engn, Hangzhou, Zhejiang, Peoples R China

[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China

来源：

2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS) | 2015年

关键词：

shared last-level cache; multicore; heterogeneous system; HIGH-PERFORMANCE; REPLACEMENT;

D O I：

10.1109/HPCC-CSS-ICESS.2015.290

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

There is a growing trend towards heterogeneous systems, which contain CPUs and GPGPUs in a single chip. Managing those various on-chip resources shared between CPUs and GPGPUs, however, is a big issue and the last-level cache (LLC) is one of the most critical resources due to its impact on system performance. Some well-known cache replacement policies like LRU and DRRIP, designed for a CPU, can not be so well qualified for heterogeneous systems because the LLC will be dominated by memory accesses from thousands of threads of GPGPU applications and this may lead to significant performance downgrade for a CPU. Another reason is that a GPGPU is able to tolerate memory latency when quantity of active threads in the GPGPU is sufficient, but those policies do not utilize this feature. In this paper we propose a novel shared LLC management policy for CPU-GPGPU heterogeneous systems called Buffer Filter which takes advantage of memory latency tolerance of GPGPUs. This policy has the ability to restrict streaming requests of GPGPU by adding a buffer to memory system and vacate LLC space for cache-sensitive CPU applications. Although there is some IPC loss for GPGPU but the memory latency tolerance ensures the basic performance of GPGPU's applications. The experiments show that the Buffer Filter is able to filtrate up to 50% to 75% of the total GPGPU streaming requests at the cost of little GPGPU IPC decrease and improve the hit rate of CPU applications by 2x to 7x.

引用

页码：266 / 271

页数：6

共 34 条

[11] Deadblock Aware Adaptive Eviction Policy for Shared Last-Level Cache
Wu, Zhaohui
Chen, Wei
Li, Bin
Liu, Zequan
Long, Shusheng
2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 1498 - 1501
[12] An Application-Aware Cache Replacement Policy for Last-Level Caches
Warrier, Tripti S.
Anupama, B.
Mutyam, Madhu
ARCHITECTURE OF COMPUTING SYSTEMS - ARCS 2013, 2013, 7767 : 207 - 219
[13] Harvesting Row-Buffer Hits via Orchestrated Last-Level Cache and DRAM Scheduling for Heterogeneous Multicore Systems
Song, Yang
Alavoine, Olivier
Lin, Bill
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2019, 24 (01)
[14] Shared Last-level Cache Management for GPGPUs with Hybrid Main Memory
Wang, Guan
Cai, Xiaojun
Ju, Lei
Zang, Chuanqi
Zhao, Mengying
Jia, Zhiping
PROCEEDINGS OF THE 2017 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2017, : 25 - 30
[15] A Last-Level Cache Management for Enhancing Endurance of Phase Change Memory
Lee, Won Jun
Kim, Chang Hyun
Kim, Seon Wook
2021 36TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC), 2021,
[16] CWFP: Novel Collective Writeback and Fill Policy for Last-Level DRAM Cache
Yin, Shouyi
Xu, Weizhi
Li, Jiakun
Liu, Leibo
Wei, Shaojun
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2016, 24 (07) : 2548 - 2561
[17] OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches
Wang, Jue
Dong, Xiangyu
Xie, Yuan
DESIGN, AUTOMATION & TEST IN EUROPE, 2013, : 847 - 852
[18] SRAM- and STT-RAM-based hybrid, shared last-level cache for on-chip CPU–GPU heterogeneous architectures
Lan Gao
Rui Wang
Yunlong Xu
Hailong Yang
Zhongzhi Luan
Depei Qian
Han Zhang
Jihong Cai
The Journal of Supercomputing, 2018, 74 : 3388 - 3414
[19] Bulkyflip: A NAND-SPIN-Based Last-Level Cache With Bandwidth-Oriented Write Management Policy
Wu, Bi
Dai, Pengcheng
Wang, Zhaohao
Wang, Chao
Wang, Ying
Yang, Jianlei
Cheng, Yuanqing
Liu, Dijun
Zhang, Youguang
Zhao, Weisheng
Hu, Xiaobo Sharon
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (01) : 108 - 120
[20] SRAM- and STT-RAM-based hybrid, shared last-level cache for on-chip CPU-GPU heterogeneous architectures
Gao, Lan
Wang, Rui
Xu, Yunlong
Yang, Hailong
Luan, Zhongzhi
Qian, Depei
Zhang, Han
Cai, Jihong
JOURNAL OF SUPERCOMPUTING, 2018, 74 (07): : 3388 - 3414

← 1 2 3 4 →