Locality Protected Dynamic Cache Allocation Scheme on GPUs

被引：0

作者：

Zhang, Yang ^{[1
]}

Xing, Zuocheng ^{[1
]}

Zhou, Li ^{[2
]}

Zhu, Chunsheng ^{[3
]}

机构：

[1] Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China

[2] Natl Univ Def Technol, Sch Elect Sci & Engn, Changsha, Hunan, Peoples R China

[3] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC, Canada

来源：

2016 IEEE TRUSTCOM/BIGDATASE/ISPA | 2016年

关键词：

PARALLELISM;

D O I：

10.1109/TrustCom.2016.235

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As we are approaching the exascale era in super-computing, designing a balanced computer system with powerful computing ability and low energy consumption becomes increasingly important. GPU is a widely used accelerator in most recently applied supercomputers. It adopts massive multithreads to hide long latency and has high energy efficiency. In contrast to its strong computing power, GPUs have few on-chip resources with several MB of fast on-chip memory storage per SM (Streaming Multiprocessors). GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design. Since the severe deficiency in on-chip memory, the benefit of high computing capacity of GPUs is pulled down by the poor cache performance dramatically, which limits system performance and energy-efficiency. In this paper, we put forward a locality protected scheme to make full use of the data locality based on the fixed capacity. We present a Locality Protected method based on instruction PC (LPP) to promote GPU performance. Firstly, we use a PC-based collector to collect the reuse information of each cache line. After getting the dynamic reuse information of the cache line, we take an intelligent cache allocation unit (ICAU) which coordinates the reuse information with LRU (Least Recently Used) replacement policy to find out the cache line with the least locality for eviction. The results show that LPP provides an up to 17.8% speedup and an average of 5.5% improvement over the baseline method.

引用

页码：1524 / 1530

页数：7

共 50 条

[31] Thread scheduling for cache locality
Philbin, J.
Edler, J.
Anshus, O.J.
Douglas, C.C.
Li, K.
Computer architecture news, 1996, 24 (Special Issu) : 60 - 71
[32] Adaptive and Transparent Cache Bypassing for GPUs
Li, Ang
van den Braak, Gert-Jan
Kumar, Akash
Corporaal, Henk
PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
[33] Dynamic cache invalidation scheme for wireless mobile environments
Alok Madhukar
Tansel Özyer
Reda Alhajj
Wireless Networks, 2009, 15 : 727 - 740
[34] Dynamic cache invalidation scheme for wireless mobile environments
Madhukar, Alok
Ozyer, Tansel
Alhajj, Reda
WIRELESS NETWORKS, 2009, 15 (06) : 727 - 740
[35] A Penalty Aware Memory Allocation Scheme for Key-value Cache
Ou, Jianqiang
Patton, Marc
Moore, Michael Devon
Xu, Yuehai
Jiang, Song
2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2015, : 530 - 539
[36] DYNAMIC STORAGE-ALLOCATION SCHEME
ILIFFE, JK
JODEIT, JG
COMPUTER JOURNAL, 1962, 5 (03): : 200 - &
[37] A Cache Allocation Scheme in 5G-Enabled Inhomogeneous ICVs
Wang, Cong
Chen, Chen
Liu, Yangyang
Fan, Kefeng
Pei, Qingqi
He, Ci
Dou, Zhibin
2020 IEEE 92ND VEHICULAR TECHNOLOGY CONFERENCE (VTC2020-FALL), 2020,
[38] Dynamic RACH Preamble Allocation Scheme
Hwang, Hyun-Yong
Oh, Sung-Min
Lee, Changhee
Kim, Jae Heung
Shin, Jaesheung
2015 INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC), 2015, : 770 - 772
[39] Soft error mitigation in cache memories of embedded systems by means of a protected scheme
Zarandi, HR
Miremadi, SG
DEPENDABLE COMPUTING, PROCEEDINGS, 2005, 3747 : 121 - 130
[40] Dynamic Storage Cache Allocation in Multi-Server Architectures
Prabhakar, Ramya
Srikantaiah, Shekhar
Patrick, Christina
Kandemir, Mahmut
PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS, 2009,

← 1 2 3 4 5 →