Locality Protected Dynamic Cache Allocation Scheme on GPUs

被引:0
|
作者
Zhang, Yang [1 ]
Xing, Zuocheng [1 ]
Zhou, Li [2 ]
Zhu, Chunsheng [3 ]
机构
[1] Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China
[2] Natl Univ Def Technol, Sch Elect Sci & Engn, Changsha, Hunan, Peoples R China
[3] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC, Canada
关键词
PARALLELISM;
D O I
10.1109/TrustCom.2016.235
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As we are approaching the exascale era in super-computing, designing a balanced computer system with powerful computing ability and low energy consumption becomes increasingly important. GPU is a widely used accelerator in most recently applied supercomputers. It adopts massive multithreads to hide long latency and has high energy efficiency. In contrast to its strong computing power, GPUs have few on-chip resources with several MB of fast on-chip memory storage per SM (Streaming Multiprocessors). GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design. Since the severe deficiency in on-chip memory, the benefit of high computing capacity of GPUs is pulled down by the poor cache performance dramatically, which limits system performance and energy-efficiency. In this paper, we put forward a locality protected scheme to make full use of the data locality based on the fixed capacity. We present a Locality Protected method based on instruction PC (LPP) to promote GPU performance. Firstly, we use a PC-based collector to collect the reuse information of each cache line. After getting the dynamic reuse information of the cache line, we take an intelligent cache allocation unit (ICAU) which coordinates the reuse information with LRU (Least Recently Used) replacement policy to find out the cache line with the least locality for eviction. The results show that LPP provides an up to 17.8% speedup and an average of 5.5% improvement over the baseline method.
引用
收藏
页码:1524 / 1530
页数:7
相关论文
共 50 条
  • [1] Locality-protected cache allocation scheme with low overhead on GPUs
    Zhang, Yang
    Xing, Zuocheng
    Tang, Chuan
    Liu, Cang
    IET COMPUTERS AND DIGITAL TECHNIQUES, 2018, 12 (03): : 87 - 94
  • [2] CWLP: coordinated warp scheduling and locality-protected cache allocation on GPUs
    Zhang, Yang
    Xing, Zuo-cheng
    Liu, Cang
    Tang, Chuan
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2018, 19 (02) : 206 - 220
  • [3] CWLP: coordinated warp scheduling and locality-protected cache allocation on GPUs
    Yang Zhang
    Zuo-cheng Xing
    Cang Liu
    Chuan Tang
    Frontiers of Information Technology & Electronic Engineering, 2018, 19 : 206 - 220
  • [4] Locality-Driven Dynamic Flash Cache Allocation
    Xu, Liang
    Xia, Qianbin
    Xiao, Weijun
    2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 185 - 193
  • [5] IMPROVING THE CACHE LOCALITY OF MEMORY ALLOCATION
    GRUNWALD, D
    ZORN, B
    HENDERSON, R
    SIGPLAN NOTICES, 1993, 28 (06): : 177 - 186
  • [6] Uantifying data locality in dynamic parallelism in GPUs
    Tang, Xulong
    Pattnaik, Ashutosh
    Kayiran, Onur
    Jog, Adwait
    Kandemir, Mahmut Taylan
    Das, Chita
    Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2018, 2 (03)
  • [7] Coordinated Static and Dynamic Cache Bypassing for GPUs
    Xie, Xiaolong
    Liang, Yun
    Wang, Yu
    Sun, Guangyu
    Wang, Tao
    2015 IEEE 21ST INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2015, : 76 - 88
  • [8] Locality-driven MRC Construction and Cache Allocation
    Fu, Jianyu
    Arteaga, Dulcardo
    Zhao, Ming
    HPDC '18: PROCEEDINGS OF THE 27TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING: POSTERS/DOCTORAL CONSORTIUM, 2018, : 19 - 20
  • [9] LaPerm: Locality Aware Scheduler for Dynamic Parallelism on GPUs
    Wang, Jin
    Rubin, Norm
    Sidelnik, Albert
    Yalamanchili, Sudhakar
    2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 583 - 595
  • [10] Effective cache coherence scheme using data locality
    Lee, Dongkwang
    Ahn, Byoungchul
    Kweon, Hyekseong
    Bae, Kukho
    Yoon, Kiryong
    IEEE Pacific RIM Conference on Communications, Computers, and Signal Processing - Proceedings, 1999, : 158 - 161