Shared Last-Level Cache Management and Memory Scheduling for GPGPUs with Hybrid Main Memory

被引:0
|
作者
Wang, Guan [1 ]
Zang, Chuanqi [2 ]
Ju, Lei [2 ]
Zhao, Mengying [1 ]
Cai, Xiaojun [1 ]
Jia, Zhiping [1 ]
机构
[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao, Peoples R China
[2] Shandong Univ, Sch Software, Jinan, Shandong, Peoples R China
基金
国家重点研发计划;
关键词
NVM; GPGPU; hybrid memory; cache management; cache bypassing; memory scheduling; HIGH-PERFORMANCE; ALLOCATION; PCM;
D O I
10.1145/3230643
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Memory intensive workloads become increasingly popular on general purpose graphics processing units (GPGPUs), and impose great challenges on the GPGPU memory subsystem design. On the other hand, with the recent development of non-volatile memory (NVM) technologies, hybridmemory combining both DRAM and NVM achieves high performance, low power, and high density simultaneously, which provides a promising main memory design for GPGPUs. In this article, we explore the shared last-level cache management for GPGPUs with consideration of the underlying hybrid main memory. To improve the overall memory subsystem performance, we exploit the characteristics of both the asymmetric read/write latency of the hybrid main memory architecture, as well as the memory coalescing feature of GPGPUs. In particular, to reduce the average cost of L2 cache misses, we prioritize cache blocks from DRAM or NVM based on observations that operations to NVM part of main memory have a large impact on the system performance. Furthermore, the cache management scheme also integrates the GPU memory coalescing and cache bypassing techniques to improve the overall system performance. To minimize the impact of memory divergence behaviors among simultaneously executed groups of threads, we propose a hybrid main memory and warp aware memory scheduling mechanism for GPGPUs. Experimental results show that in the context of a hybrid main memory system, our proposed L2 cache management policy and memory scheduling mechanism improve performance by 15.69% on average for memory intensive benchmarks, whereas the maximum gain can be up to 29% and achieve an average memory subsystem energy reduction of 21.27%.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] High Performance and Predictable Shared Last-level Cache for Safety-Critical Systems
    Wu, Zhuanhao
    Kaushik, Anirudh
    Patel, Hiren
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (06)
  • [32] Hybrid-Comp: A Criticality-Aware Compressed Last-Level Cache
    Jadidi, Amin
    Arjomand, Mohammad
    Kandemir, Mahmut T.
    Das, Chita R.
    2018 19TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED), 2018, : 25 - 30
  • [33] Adaptive Memory-Side Last-Level GPU Caching
    Zhao, Xia
    Adileh, Almutaz
    Yu, Zhibin
    Wang, Zhiying
    Jaleel, Aamer
    Eeckhout, Lieven
    PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19), 2019, : 411 - 423
  • [34] Reuse locality aware cache partitioning for last-level cache
    Shen, Fanfan
    He, Yanxiang
    Zhang, Jun
    Li, Qingan
    Li, Jianhua
    Xu, Chao
    COMPUTERS & ELECTRICAL ENGINEERING, 2019, 74 : 319 - 330
  • [35] Writeback-Aware Partitioning and Replacement for Last-Level Caches in Phase Change Main Memory Systems
    Zhou, Miao
    Du, Yu
    Childers, Bruce
    Melhem, Rami
    Mosse, Daniel
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2012, 8 (04)
  • [36] Cache Friendliness-Aware Management of Shared Last-Level Caches for High Performance Multi-Core Systems
    Kaseridis, Dimitris
    Iqbal, Muhammad Faisal
    John, Lizy Kurian
    IEEE TRANSACTIONS ON COMPUTERS, 2014, 63 (04) : 874 - 887
  • [37] Two proposals for the inclusion of directory information in the last-level private caches of glueless shared-memory multiprocessors
    Ros, Alberto
    Fernandez-Pascual, Ricardo
    Acacio, Manuel E.
    Garcia, Jose M.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2008, 68 (11) : 1413 - 1424
  • [38] Premier: A Concurrency-Aware Pseudo-Partitioning Framework for Shared Last-Level Cache
    Lu, Xiaoyang
    Wang, Rujia
    Sun, Xian-He
    2021 IEEE 39TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2021), 2021, : 391 - 394
  • [39] Shared Last-Level TLBs for Chip Multiprocessors
    Bhattacharjee, Abhishek
    Lustig, Daniel
    Martonosi, Margaret
    2011 IEEE 17TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2011, : 62 - 73
  • [40] RExCache: Rapid Exploration of Unified Last-level Cache
    Shwe, Su Myat Min
    Javaid, Haris
    Parameswaran, Sri
    2013 18TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2013, : 582 - 587