Locality Protected Dynamic Cache Allocation Scheme on GPUs

被引:0
|
作者
Zhang, Yang [1 ]
Xing, Zuocheng [1 ]
Zhou, Li [2 ]
Zhu, Chunsheng [3 ]
机构
[1] Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China
[2] Natl Univ Def Technol, Sch Elect Sci & Engn, Changsha, Hunan, Peoples R China
[3] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC, Canada
关键词
PARALLELISM;
D O I
10.1109/TrustCom.2016.235
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As we are approaching the exascale era in super-computing, designing a balanced computer system with powerful computing ability and low energy consumption becomes increasingly important. GPU is a widely used accelerator in most recently applied supercomputers. It adopts massive multithreads to hide long latency and has high energy efficiency. In contrast to its strong computing power, GPUs have few on-chip resources with several MB of fast on-chip memory storage per SM (Streaming Multiprocessors). GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design. Since the severe deficiency in on-chip memory, the benefit of high computing capacity of GPUs is pulled down by the poor cache performance dramatically, which limits system performance and energy-efficiency. In this paper, we put forward a locality protected scheme to make full use of the data locality based on the fixed capacity. We present a Locality Protected method based on instruction PC (LPP) to promote GPU performance. Firstly, we use a PC-based collector to collect the reuse information of each cache line. After getting the dynamic reuse information of the cache line, we take an intelligent cache allocation unit (ICAU) which coordinates the reuse information with LRU (Least Recently Used) replacement policy to find out the cache line with the least locality for eviction. The results show that LPP provides an up to 17.8% speedup and an average of 5.5% improvement over the baseline method.
引用
收藏
页码:1524 / 1530
页数:7
相关论文
共 50 条
  • [21] Scratch That (But Cache This): A Hybrid Register Cache/Scratchpad for GPUs
    Bailey, Jonathan
    Kloosterman, John
    Mahlke, Scott
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (11) : 2779 - 2789
  • [22] A Novel Cache Scheme based on Content Popularity and User Locality for Future Internet
    Tseng, Fan-Hsun
    Chien, Wei-Che
    Wang, Sheng-Jie
    Lai, Chin Feng
    Chao, Han-Chieh
    2018 27TH WIRELESS AND OPTICAL COMMUNICATION CONFERENCE (WOCC), 2018, : 133 - 137
  • [23] Improving First Level Cache Efficiency for GPUs Using Dynamic Line Protection
    Zhu, Xian
    Wernsman, Robert
    Zambreno, Joseph
    PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2018,
  • [24] A Lightweight and Adaptive Cache Allocation Scheme for Content Delivery Networks
    Liu, Ke
    Wang, Hua
    Zhou, Ke
    Li, Cong
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [25] Allocation by conflict: A simple, effective multilateral cache management scheme
    Tam, ES
    Vlaovic, SA
    Tyson, GS
    Davidson, ES
    2001 INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD 2001, PROCEEDINGS, 2001, : 133 - 140
  • [26] Cache miss-aware Dynamic Stack Allocation
    Sung-Joon, Jang
    Chung, Moo-Kyoung
    Kim, Jaemoon
    Kyung, Chong-Min
    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 3494 - +
  • [27] A NOVEL CACHE MAPPING SCHEME FOR DYNAMIC SET-BASED CACHE PARTITIONING
    Lee, Tsung
    Tsou, Hsiang-Hua
    2009 IEEE YOUTH CONFERENCE ON INFORMATION, COMPUTING AND TELECOMMUNICATION, PROCEEDINGS, 2009, : 459 - 462
  • [28] Effective Cache Bank Placement for GPUs
    Sadrosadati, Mohammad
    Mirhosseini, Amirhossein
    Roozkhosh, Shahin
    Bakhishi, Hazhir
    Sarbazi-Azad, Hamid
    PROCEEDINGS OF THE 2017 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2017, : 31 - 36
  • [29] DULO:: An effective buffer cache management scheme to exploit both temporal and spatial locality
    Jiang, S
    Ding, XN
    Chen, F
    Tan, EH
    Zhang, XD
    USENIX ASSOCIATION PROCEEDINGS OF THE 4TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2005, : 101 - 114
  • [30] Thread scheduling for cache locality
    Philbin, J
    Edler, J
    Anshus, OJ
    Douglas, CC
    Li, K
    ACM SIGPLAN NOTICES, 1996, 31 (09) : 60 - 71