BACM: Barrier-Aware Cache Management for Irregular Memory-Intensive GPGPU Workloads

被引：0

作者：

Liu, Yuxi ^{[1
,4
]}

Zhao, Xia ^{[4
]}

Yu, Zhibin ^{[2
]}

Wang, Zhenlin ^{[3
]}

Wang, Xiaolin ^{[1
]}

Luo, Yingwei ^{[1
]}

Eeckhout, Lieven ^{[4
]}

机构：

[1] Peking Univ, Beijing, Peoples R China

[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China

[3] Michigan Tech Univ, Houghton, MI 49931 USA

[4] Univ Ghent, Ghent, Belgium

来源：

2017 IEEE 35TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD) | 2017年

关键词：

D O I：

10.1109/ICCD.2017.111

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

General-purpose workloads running on modern graphics processing units rely on hardware-based barriers to synchronize warps within a thread block (TB). However, imbalance may exist before reaching a barrier if a GPGPU workload contains irregular memory accesses, i.e., some warps may be critical while others may not. Ideally, cache space should be reserved for the critical warps. Unfortunately, current cache management policies are unaware of the existence of barriers and critical warps, which significantly limits the performance of irregular memory-intensive GPGPU workloads. In this paper, we propose Barrier-Aware Cache Management (BACM) which is built on top of two underlying policies: a greedy policy and a friendly policy. The greedy policy does not allow noncritical warps to allocate cache lines in the L1 data cache; only critical warps can. The friendly policy allows non-critical warps to allocate cache lines but only over invalid or lower-priority cache lines. BACM dynamically chooses between the greedy and friendly policies based on the L1 data cache hit rate for the non-critical warps. By doing so, BACM reserves more cache space to accelerate critical warps, thereby improving overall performance. Experimental results show that BACM achieves an average performance improvement of 24% and 20% compared to the GTO and BAWS policies, respectively. BACM's hardware cost is limited to 96 bytes per streaming multiprocessor.

引用

页码：633 / 640

页数：8

共 29 条

[1] POSTER: BACM: Barrier-Aware Cache Management for Irregular Memory-Intensive GPGPU Workloads
Liu, Yuxi
Zhao, Xia
Yu, Zhibin
Wang, Zhenlin
Wang, Xiaolin
Luo, Yingwei
Eeckhout, Lieven
2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2017, : 140 - 141
[2] Energy-Efficient Scheduling for Memory-Intensive GPGPU Workloads
Song, Seokwoo
Lee, Minseok
Kim, John
Seo, Woong
Cho, Yeongon
Ryu, Soojung
2014 DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION (DATE), 2014,
[3] Batch-Aware Unified Memory Management in GPUs for Irregular Workloads
Kim, Hyojong
Sim, Jaewoong
Gera, Prasun
Hadidi, Ramyad
Kim, Hyesoon
TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), 2020, : 1357 - 1370
[4] Orchestrating Cache Management and Memory Scheduling for GPGPU Applications
Mu, Shuai
Deng, Yandong
Chen, Yubei
Li, Huaiming
Pan, Jianming
Zhang, Wenjun
Wang, Zhihua
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2014, 22 (08) : 1803 - 1814
[5] Comparing unified, pinned, and host/device memory allocations for memory-intensive workloads on Tegra SoC
Choi, Jake
You, Hojun
Kim, Chongam
Young Yeom, Heon
Kim, Yoonhee
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (04):
[6] Irregular accesses reorder unit: improving GPGPU memory coalescing for graph-based workloads
Albert Segura
Jose Maria Arnau
Antonio Gonzalez
The Journal of Supercomputing, 2023, 79 : 762 - 787
[7] Irregular accesses reorder unit: improving GPGPU memory coalescing for graph-based workloads
Segura, Albert
Arnau, Jose Maria
Gonzalez, Antonio
JOURNAL OF SUPERCOMPUTING, 2023, 79 (01): : 762 - 787
[8] Evaluation of Intel 3D-Xpoint NVDIMM Technology for Memory-Intensive Genomic Workloads
Waddington, Daniel
Kunitomi, Mark
Dickey, Clem
Rao, Samyukta
Abboud, Amir
Tran, Jantz
MEMSYS 2019: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2019, : 277 - 287
[9] COSMOS: Coordinated Management of Cores, Memory, and Compressed Memory Swap for QoS-Aware and Efficient Workload Consolidation for Memory-Intensive Applications
Han, Myeonggyun
Park, Eunseong
Shin, Youngsam
Oh, Deok-Jae
Cho, Yeongon
Baek, Woongki
IEEE ACCESS, 2023, 11 : 133199 - 133214
[10] Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance
Candel, Francisco
Valero, Alejandro
Petit, Salvador
Sahuquillo, Julio
IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (10) : 1442 - 1454

← 1 2 3 →