Shared Last-Level Cache Management and Memory Scheduling for GPGPUs with Hybrid Main Memory

被引:0
|
作者
Wang, Guan [1 ]
Zang, Chuanqi [2 ]
Ju, Lei [2 ]
Zhao, Mengying [1 ]
Cai, Xiaojun [1 ]
Jia, Zhiping [1 ]
机构
[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao, Peoples R China
[2] Shandong Univ, Sch Software, Jinan, Shandong, Peoples R China
基金
国家重点研发计划;
关键词
NVM; GPGPU; hybrid memory; cache management; cache bypassing; memory scheduling; HIGH-PERFORMANCE; ALLOCATION; PCM;
D O I
10.1145/3230643
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Memory intensive workloads become increasingly popular on general purpose graphics processing units (GPGPUs), and impose great challenges on the GPGPU memory subsystem design. On the other hand, with the recent development of non-volatile memory (NVM) technologies, hybridmemory combining both DRAM and NVM achieves high performance, low power, and high density simultaneously, which provides a promising main memory design for GPGPUs. In this article, we explore the shared last-level cache management for GPGPUs with consideration of the underlying hybrid main memory. To improve the overall memory subsystem performance, we exploit the characteristics of both the asymmetric read/write latency of the hybrid main memory architecture, as well as the memory coalescing feature of GPGPUs. In particular, to reduce the average cost of L2 cache misses, we prioritize cache blocks from DRAM or NVM based on observations that operations to NVM part of main memory have a large impact on the system performance. Furthermore, the cache management scheme also integrates the GPU memory coalescing and cache bypassing techniques to improve the overall system performance. To minimize the impact of memory divergence behaviors among simultaneously executed groups of threads, we propose a hybrid main memory and warp aware memory scheduling mechanism for GPGPUs. Experimental results show that in the context of a hybrid main memory system, our proposed L2 cache management policy and memory scheduling mechanism improve performance by 15.69% on average for memory intensive benchmarks, whereas the maximum gain can be up to 29% and achieve an average memory subsystem energy reduction of 21.27%.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Runtime-Driven Shared Last-Level Cache Management for Task-Parallel Programs
    Pan, Abhisek
    Pai, Vijay S.
    PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
  • [22] NOVELLA: Nonvolatile Last-Level Cache Bypass for Optimizing Off-Chip Memory Energy
    Bagchi, Aritra
    Rishabh, Ohm
    Panda, Preeti Ranjan
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (11) : 3913 - 3924
  • [23] NoHammer: Preventing Row Hammer With Last-Level Cache Management
    Lee, Seunghak
    Kang, Ki-Dong
    Park, Gyeongseo
    Kim, Nam Sung
    Kim, Daehoon
    IEEE COMPUTER ARCHITECTURE LETTERS, 2023, 22 (02) : 157 - 160
  • [24] Write Avoidance Cache Coherence Protocol for Non-volatile Memory as Last-Level Cache in Chip-Multiprocessor
    Choi, Ju Hee
    Kwak, Jong Wook
    Jhon, Chu Shik
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (08): : 2166 - 2169
  • [25] Dynamically Reconfigurable Hybrid Cache: An Energy-Efficient Last-Level Cache Design
    Chen, Yu-Ting
    Cong, Jason
    Huang, Hui
    Liu, Bin
    Liu, Chunyue
    Potkonjak, Miodrag
    Reinman, Glenn
    DESIGN, AUTOMATION & TEST IN EUROPE (DATE 2012), 2012, : 45 - 50
  • [26] Design of an area and energy-efficient last-level cache memory using STT-MRAM
    Saha, Rajesh
    Pundir, Yogendra Pratap
    Pal, Pankaj Kumar
    JOURNAL OF MAGNETISM AND MAGNETIC MATERIALS, 2021, 529
  • [27] Cache/Memory Coordinated Fair Scheduling for Hybrid Memory Systems
    Chen, Di
    Liu, Haikun
    Jin, Hai
    Liao, Xiaofei
    HP3C 2020: PROCEEDINGS OF THE 2020 4TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPILATION, COMPUTING AND COMMUNICATIONS, 2020, : 103 - 111
  • [28] Ultra-large last-level cache (UL3C) of phase change memory
    Li, Hai-Xin
    Jing, Wei-Liang
    Guo, Ji-Peng
    Du, Yuan
    Song, Zhi-Tang
    Chen, Bomy
    Journal of Computers (Taiwan), 2020, 31 (02) : 152 - 167
  • [29] ADAPTIVE BLOCK LEVEL MANAGEMENT FOR HYBRID MAIN MEMORY
    Yang, Renhua
    Xue, Xiaoyong
    Xie, Yufeng
    Lin, Yinyin
    2014 12TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2014,
  • [30] Contention Tracking in GPU Last-Level Cache
    Barrera, Javier
    Kosmidis, Leonidas
    Tabani, Hamid
    Abella, Jaume
    Cazorla, Francisco J.
    2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 2022, : 76 - 79