BACM: Barrier-Aware Cache Management for Irregular Memory-Intensive GPGPU Workloads

被引:0
|
作者
Liu, Yuxi [1 ,4 ]
Zhao, Xia [4 ]
Yu, Zhibin [2 ]
Wang, Zhenlin [3 ]
Wang, Xiaolin [1 ]
Luo, Yingwei [1 ]
Eeckhout, Lieven [4 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[3] Michigan Tech Univ, Houghton, MI 49931 USA
[4] Univ Ghent, Ghent, Belgium
关键词
D O I
10.1109/ICCD.2017.111
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
General-purpose workloads running on modern graphics processing units rely on hardware-based barriers to synchronize warps within a thread block (TB). However, imbalance may exist before reaching a barrier if a GPGPU workload contains irregular memory accesses, i.e., some warps may be critical while others may not. Ideally, cache space should be reserved for the critical warps. Unfortunately, current cache management policies are unaware of the existence of barriers and critical warps, which significantly limits the performance of irregular memory-intensive GPGPU workloads. In this paper, we propose Barrier-Aware Cache Management (BACM) which is built on top of two underlying policies: a greedy policy and a friendly policy. The greedy policy does not allow noncritical warps to allocate cache lines in the L1 data cache; only critical warps can. The friendly policy allows non-critical warps to allocate cache lines but only over invalid or lower-priority cache lines. BACM dynamically chooses between the greedy and friendly policies based on the L1 data cache hit rate for the non-critical warps. By doing so, BACM reserves more cache space to accelerate critical warps, thereby improving overall performance. Experimental results show that BACM achieves an average performance improvement of 24% and 20% compared to the GTO and BAWS policies, respectively. BACM's hardware cost is limited to 96 bytes per streaming multiprocessor.
引用
收藏
页码:633 / 640
页数:8
相关论文
共 29 条
  • [1] POSTER: BACM: Barrier-Aware Cache Management for Irregular Memory-Intensive GPGPU Workloads
    Liu, Yuxi
    Zhao, Xia
    Yu, Zhibin
    Wang, Zhenlin
    Wang, Xiaolin
    Luo, Yingwei
    Eeckhout, Lieven
    2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2017, : 140 - 141
  • [2] Energy-Efficient Scheduling for Memory-Intensive GPGPU Workloads
    Song, Seokwoo
    Lee, Minseok
    Kim, John
    Seo, Woong
    Cho, Yeongon
    Ryu, Soojung
    2014 DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION (DATE), 2014,
  • [3] Batch-Aware Unified Memory Management in GPUs for Irregular Workloads
    Kim, Hyojong
    Sim, Jaewoong
    Gera, Prasun
    Hadidi, Ramyad
    Kim, Hyesoon
    TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), 2020, : 1357 - 1370
  • [4] Orchestrating Cache Management and Memory Scheduling for GPGPU Applications
    Mu, Shuai
    Deng, Yandong
    Chen, Yubei
    Li, Huaiming
    Pan, Jianming
    Zhang, Wenjun
    Wang, Zhihua
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2014, 22 (08) : 1803 - 1814
  • [5] Comparing unified, pinned, and host/device memory allocations for memory-intensive workloads on Tegra SoC
    Choi, Jake
    You, Hojun
    Kim, Chongam
    Young Yeom, Heon
    Kim, Yoonhee
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (04):
  • [6] Irregular accesses reorder unit: improving GPGPU memory coalescing for graph-based workloads
    Albert Segura
    Jose Maria Arnau
    Antonio Gonzalez
    The Journal of Supercomputing, 2023, 79 : 762 - 787
  • [7] Irregular accesses reorder unit: improving GPGPU memory coalescing for graph-based workloads
    Segura, Albert
    Arnau, Jose Maria
    Gonzalez, Antonio
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (01): : 762 - 787
  • [8] Evaluation of Intel 3D-Xpoint NVDIMM Technology for Memory-Intensive Genomic Workloads
    Waddington, Daniel
    Kunitomi, Mark
    Dickey, Clem
    Rao, Samyukta
    Abboud, Amir
    Tran, Jantz
    MEMSYS 2019: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2019, : 277 - 287
  • [9] COSMOS: Coordinated Management of Cores, Memory, and Compressed Memory Swap for QoS-Aware and Efficient Workload Consolidation for Memory-Intensive Applications
    Han, Myeonggyun
    Park, Eunseong
    Shin, Youngsam
    Oh, Deok-Jae
    Cho, Yeongon
    Baek, Woongki
    IEEE ACCESS, 2023, 11 : 133199 - 133214
  • [10] Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance
    Candel, Francisco
    Valero, Alejandro
    Petit, Salvador
    Sahuquillo, Julio
    IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (10) : 1442 - 1454