Adaptive Cache Management for Energy-efficient GPU Computing

被引:92
|
作者
Chen, Xuhao [1 ,2 ,3 ]
Chang, Li-Wen [3 ]
Rodrigues, Christopher I. [3 ]
Lv, Jie [3 ]
Wang, Zhiying [1 ,2 ]
Hwu, Wen-Mei [3 ]
机构
[1] Natl Univ Def Technol, State Key Lab High Performance Comp, Changsha, Hunan, Peoples R China
[2] Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China
[3] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL USA
关键词
GPGPU; cache management; bypass; warp throttling; REPLACEMENT;
D O I
10.1109/MICRO.2014.11
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the SIMT execution model, GPUs can hide memory latency through massive multithreading for many applications that have regular memory access patterns. To support applications with irregular memory access patterns, cache hierarchies have been introduced to GPU architectures to capture temporal and spatial locality and mitigate the effect of irregular accesses. However, GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design, which limits system performance and energy-efficiency. The massive amount of memory requests generated by GPUs cause cache contention and resource congestion. Existing CPU cache management policies that are designed for multicore systems, can be suboptimal when directly applied to GPU caches. We propose a specialized cache management policy for GPGPUs. The cache hierarchy is protected from contention by the bypass policy based on reuse distance. Contention and resource congestion are detected at runtime. To avoid over-saturating on-chip resources, the bypass policy is coordinated with warp throttling to dynamically control the active number of warps. We also propose a simple predictor to dynamically estimate the optimal number of active warps that can take full advantage of the cache space and on-chip resources. Experimental results show that cache efficiency is significantly improved and on-chip resources are better utilized for cache-sensitive benchmarks. This results in a harmonic mean IPC improvement of 74% and 17% (maximum 661% and 44% IPC improvement), compared to the baseline GPU architecture and optimal static warp throttling, respectively.
引用
收藏
页码:343 / 355
页数:13
相关论文
共 50 条
  • [31] Hybrid Scratchpad and Cache Memory Management for Energy-Efficient Parallel HEVC Encoding
    Song, Chang
    Ju, Lei
    Jia, Zhiping
    2015 33RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2015, : 712 - 719
  • [32] Adaptive file cache management for mobile computing
    Mei, JM
    Bunt, R
    MOBILE DATA MANAGEMENT, PROCEEDINGS, 2003, 2574 : 369 - 373
  • [33] AOS: Adaptive Overwrite Scheme for Energy-Efficient MLC STT-RAM Cache
    Chen, Xunchao
    Khoshavi, Navid
    Zhou, Jian
    Huang, Dan
    DeMara, Ronald F.
    Wang, Jun
    Wen, Wujie
    Chen, Yiran
    2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2016,
  • [34] An Energy-Efficient Multi-GPU Supercomputer
    Rohr, David
    Kalcher, Sebastian
    Bach, Matthias
    Alaqeeli, Abdulqadir A.
    Alzaid, Hani M.
    Eschweiler, Dominic
    Lindenstruth, Volker
    Alkhereyf, Sakhar B.
    Alharthi, Ahmad
    Almubarak, Abdulelah
    Alqwaiz, Ibraheem
    Bin Suliman, Riman
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 42 - 45
  • [35] The Smart Cache: An Energy-Efficient Cache Architecture Through Dynamic Adaptation
    Sundararajan, Karthik T.
    Jones, Timothy M.
    Topham, Nigel P.
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2013, 41 (02) : 305 - 330
  • [36] PS-Cache: an energy-efficient cache design for chip multiprocessors
    Valls, Joan J.
    Ros, Alberto
    Sahuquillo, Julio
    Gomez, Maria E.
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (01): : 67 - 86
  • [37] PS-Cache: an energy-efficient cache design for chip multiprocessors
    Joan J. Valls
    Alberto Ros
    Julio Sahuquillo
    Maria E. Gomez
    The Journal of Supercomputing, 2015, 71 : 67 - 86
  • [38] The Smart Cache: An Energy-Efficient Cache Architecture Through Dynamic Adaptation
    Karthik T. Sundararajan
    Timothy M. Jones
    Nigel P. Topham
    International Journal of Parallel Programming, 2013, 41 : 305 - 330
  • [39] PS-Cache: An Energy-Efficient Cache Design for Chip Multiprocessors
    Valls, Joan J.
    Ros, Alberto
    Sahuquillo, Julio
    Gomez, Maria E.
    2013 22ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2013, : 407 - 407
  • [40] A Novel Joint Mobile Cache and Power Management Scheme for Energy-Efficient Mobile Augmented Reality Service in Mobile Edge Computing
    Seo, Yong-Jun
    Lee, Joohyung
    Hwang, Jungyeon
    Niyato, Dusit
    Park, Hong-Shik
    Choi, Jun Kyun
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2021, 10 (05) : 1061 - 1065