Shared Last-Level Cache Management and Memory Scheduling for GPGPUs with Hybrid Main Memory

被引:0
|
作者
Wang, Guan [1 ]
Zang, Chuanqi [2 ]
Ju, Lei [2 ]
Zhao, Mengying [1 ]
Cai, Xiaojun [1 ]
Jia, Zhiping [1 ]
机构
[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao, Peoples R China
[2] Shandong Univ, Sch Software, Jinan, Shandong, Peoples R China
基金
国家重点研发计划;
关键词
NVM; GPGPU; hybrid memory; cache management; cache bypassing; memory scheduling; HIGH-PERFORMANCE; ALLOCATION; PCM;
D O I
10.1145/3230643
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Memory intensive workloads become increasingly popular on general purpose graphics processing units (GPGPUs), and impose great challenges on the GPGPU memory subsystem design. On the other hand, with the recent development of non-volatile memory (NVM) technologies, hybridmemory combining both DRAM and NVM achieves high performance, low power, and high density simultaneously, which provides a promising main memory design for GPGPUs. In this article, we explore the shared last-level cache management for GPGPUs with consideration of the underlying hybrid main memory. To improve the overall memory subsystem performance, we exploit the characteristics of both the asymmetric read/write latency of the hybrid main memory architecture, as well as the memory coalescing feature of GPGPUs. In particular, to reduce the average cost of L2 cache misses, we prioritize cache blocks from DRAM or NVM based on observations that operations to NVM part of main memory have a large impact on the system performance. Furthermore, the cache management scheme also integrates the GPU memory coalescing and cache bypassing techniques to improve the overall system performance. To minimize the impact of memory divergence behaviors among simultaneously executed groups of threads, we propose a hybrid main memory and warp aware memory scheduling mechanism for GPGPUs. Experimental results show that in the context of a hybrid main memory system, our proposed L2 cache management policy and memory scheduling mechanism improve performance by 15.69% on average for memory intensive benchmarks, whereas the maximum gain can be up to 29% and achieve an average memory subsystem energy reduction of 21.27%.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Shared Last-level Cache Management for GPGPUs with Hybrid Main Memory
    Wang, Guan
    Cai, Xiaojun
    Ju, Lei
    Zang, Chuanqi
    Zhao, Mengying
    Jia, Zhiping
    PROCEEDINGS OF THE 2017 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2017, : 25 - 30
  • [2] Write-back Aware Shared Last-level Cache Management for Hybrid Main Memory
    Zhang, Deshan
    Ju, Lei
    Zhao, Mengying
    Gao, Xiang
    Jia, Zhiping
    2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2016,
  • [3] HAP: Hybrid-memory-Aware Partition in Shared Last-Level Cache
    Wei, Wei
    Jiang, Dejun
    Xiong, Jin
    Chen, Mingyu
    2014 32ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2014, : 28 - 35
  • [4] HAP: Hybrid-Memory-Aware Partition in Shared Last-Level Cache
    Wei, Wei
    Jiang, Dejun
    Xiong, Jin
    Chen, Mingyu
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (03)
  • [5] Cost aware cache replacement policy in shared last-level cache for hybrid memory based fog computing
    Jia, Gangyong
    Han, Guangjie
    Wang, Hao
    Wang, Feng
    ENTERPRISE INFORMATION SYSTEMS, 2018, 12 (04) : 435 - 451
  • [6] Algorithm-Switching-Based Last-Level Cache Structure with Hybrid Main Memory Architecture
    Li, Xian-Shu
    Yoon, Su-Kyung
    Kim, Jeong-Geun
    Burgstaller, Bernd
    Kim, Shin-Dug
    COMPUTER JOURNAL, 2020, 63 (01): : 123 - 136
  • [7] A Last-Level Cache Management for Enhancing Endurance of Phase Change Memory
    Lee, Won Jun
    Kim, Chang Hyun
    Kim, Seon Wook
    2021 36TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC), 2021,
  • [8] Monolithically Integrating Non-Volatile Main Memory over the Last-Level Cache
    Walden, Candace
    Singh, Devesh
    Jagasivamani, Meenatchi
    Li, Shang
    Kang, Luyi
    Asnaashari, Mehdi
    Dubois, Sylvain
    Jacob, Bruce
    Yeung, Donald
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2021, 18 (04)
  • [9] Dynamic Adaptive Replacement Policy in Shared Last-Level Cache of DRAM/PCM Hybrid Memory for Big Data Storage
    Jia, Gangyong
    Han, Guangjie
    Jiang, Jinfang
    Liu, Li
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2017, 13 (04) : 1951 - 1960
  • [10] Research on Optimizing Last Level Cache Performance for Hybrid Main Memory
    Zheng, Hua
    Ming, Zhong
    Qiu, Meikang
    Zhang, Xi
    SMART COMPUTING AND COMMUNICATION, SMARTCOM 2017, 2018, 10699 : 144 - 153