BACM: Barrier-Aware Cache Management for Irregular Memory-Intensive GPGPU Workloads

被引：0

作者：

Liu, Yuxi ^{[1
,4
]}

Zhao, Xia ^{[4
]}

Yu, Zhibin ^{[2
]}

Wang, Zhenlin ^{[3
]}

Wang, Xiaolin ^{[1
]}

Luo, Yingwei ^{[1
]}

Eeckhout, Lieven ^{[4
]}

机构：

[1] Peking Univ, Beijing, Peoples R China

[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China

[3] Michigan Tech Univ, Houghton, MI 49931 USA

[4] Univ Ghent, Ghent, Belgium

来源：

2017 IEEE 35TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD) | 2017年

关键词：

D O I：

10.1109/ICCD.2017.111

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

General-purpose workloads running on modern graphics processing units rely on hardware-based barriers to synchronize warps within a thread block (TB). However, imbalance may exist before reaching a barrier if a GPGPU workload contains irregular memory accesses, i.e., some warps may be critical while others may not. Ideally, cache space should be reserved for the critical warps. Unfortunately, current cache management policies are unaware of the existence of barriers and critical warps, which significantly limits the performance of irregular memory-intensive GPGPU workloads. In this paper, we propose Barrier-Aware Cache Management (BACM) which is built on top of two underlying policies: a greedy policy and a friendly policy. The greedy policy does not allow noncritical warps to allocate cache lines in the L1 data cache; only critical warps can. The friendly policy allows non-critical warps to allocate cache lines but only over invalid or lower-priority cache lines. BACM dynamically chooses between the greedy and friendly policies based on the L1 data cache hit rate for the non-critical warps. By doing so, BACM reserves more cache space to accelerate critical warps, thereby improving overall performance. Experimental results show that BACM achieves an average performance improvement of 24% and 20% compared to the GTO and BAWS policies, respectively. BACM's hardware cost is limited to 96 bytes per streaming multiprocessor.

引用

页码：633 / 640

页数：8

共 29 条

[21] Staccato: shared-memory work-stealing task scheduler with cache-aware memory management
Kuchumov, Ruslan
Sokolov, Andrey
Korkhov, Vladimir
INTERNATIONAL JOURNAL OF WEB AND GRID SERVICES, 2019, 15 (04) : 394 - 407
[22] ReMAP: Reuse and Memory Access Cost Aware Eviction Policy for Last Level Cache Management
Arunkumar, Akhil
Wu, Carole-Jean
2014 32ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2014, : 110 - 117
[23] Energy-aware code cache management for memory-constrained Java']Java devices
Chen, G
Chen, G
Kandemir, M
Vijaykrishnan, N
Irwin, MJ
IEEE INTERNATIONAL SOC CONFERENCE, PROCEEDINGS, 2003, : 179 - 182
[24] Adaptive Writeback-aware Cache Management Policy for Lifetime Extension of Non-volatile Memory
Hwang, Sang-Ho
Choi, Ju Hee
Kwak, Jong Wook
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, 2017, 17 (04) : 514 - 523
[25] WADE: Writeback-Aware Dynamic Cache Management for NVM-Based Main Memory System
Wang, Zhe
Shan, Shuchang
Cao, Ting
Gu, Junli
Xu, Yi
Mu, Shuai
Xie, Yuan
Jimenez, Daniel A.
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 10 (04)
[26] Write-back Aware Shared Last-level Cache Management for Hybrid Main Memory
Zhang, Deshan
Ju, Lei
Zhao, Mengying
Gao, Xiang
Jia, Zhiping
2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2016,
[27] HCCache: A hybrid client-side cache management scheme for I/O-intensive workloads in network-based file systems
Li, Xiuqiao
Dong, Bin
Xiao, Limin
Ruan, Li
Liu, Dongmei
2012 13TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS, AND TECHNOLOGIES (PDCAT 2012), 2012, : 467 - 473
[28] A Column-Aware Index Management Using Flash Memory for Read-Intensive Databases
Byun, Si-Woo
Jang, Seok-Woo
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2015, 11 (03): : 389 - 405
[29] EXTMEM: Enabling Application-Aware Virtual Memory Management for Data-Intensive Applications
Jalalian, Sepehr
Patel, Shaurya
Hajidehi, Milad Rezaei
Seltzer, Margo
Fedorova, Alexandra
PROCEEDINGS OF THE 2024 USENIX ANNUAL TECHNICAL CONFERENCE, ATC 2024, 2024, : 397 - 408

← 1 2 3 →