CPU-Assisted GPGPU on Fused CPU-GPU Architectures

被引：0

作者：

Yang, Yi ^{[1
]}

Xiang, Ping ^{[1
]}

Mantor, Mike ^{[2
]}

Zhou, Huiyang ^{[1
]}

机构：

[1] North Carolina State Univ, Dept Elect & Comp Engn, Raleigh, NC 27695 USA

[2] Adv Micro Devices Inc, Graph Prod Grp, Sunnyvale, CA USA

来源：

2012 IEEE 18TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) | 2012年

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a novel approach to utilize the CPU resource to facilitate the execution of GPGPU programs on fused CPU-GPU architectures. In our model of fused architectures, the GPU and the CPU are integrated on the same die and share the on-chip L3 cache and off-chip memory, similar to the latest Intel Sandy Bridge and AMD accelerated processing unit (APU) platforms. In our proposed CPU-assisted GPGPU, after the CPU launches a GPU program, it executes a pre-execution program, which is generated automatically from the GPU kernel using our proposed compiler algorithms and contains memory access instructions of the GPU kernel for multiple thread-blocks. The CPU pre-execution program runs ahead of GPU threads because (1) the CPU pre-execution thread only contains memory fetch instructions from GPU kernels and not floating-point computations, and (2) the CPU runs at higher frequencies and exploits higher degrees of instruction-level parallelism than GPU scalar cores. We also leverage the prefetcher at the L2-cache on the CPU side to increase the memory traffic from CPU. As a result, the memory accesses of GPU threads hit in the L3 cache and their latency can be drastically reduced. Since our pre-execution is directly controlled by user-level applications, it enjoys both high accuracy and flexibility. Our experiments on a set of benchmarks show that our proposed pre-execution improves the performance by up to 113% and 21.4% on average.

引用

页码：103 / 114

页数：12

共 50 条

[1] Reducing CPU-GPU Interferences to Improve CPU Performance in Heterogeneous Architectures
Wen H.
Zhang W.
[J]. Journal of Computing Science and Engineering, 2020, 16 (04) : 131 - 145
[2] Denial of Service in CPU-GPU Heterogeneous Architectures
Wen, Hao
Zhang, Wei
[J]. 2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
[3] Co-Scheduling on Fused CPU-GPU Architectures With Shared Last Level Caches
Damschen, Marvin
Mueller, Frank
Henkel, Joerg
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (11) : 2337 - 2347
[4] Hardware Support for Concurrent Detection of Multiple Concurrency Bugs on Fused CPU-GPU Architectures
Zhang, Weihua
Yu, Shiqiang
Wang, Haojun
Dai, Zhuofang
Chen, Haibo
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (10) : 3083 - 3095
[5] A Sample-Based Dynamic CPU and GPU LLC Bypassing Method for Heterogeneous CPU-GPU Architectures
Wang, Xin
Zhang, Wei
[J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2017, : 753 - 760
[6] CPU-Assisted GPU Thread Pool Model for Dynamic Task Parallelism
Zhang, Shuai
Li, Tao
Dong, Qiankun
Liu, Xuechen
Yang, Yulu
[J]. PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS), 2015, : 135 - 140
[7] REDEFINING THE ROLE OF THE CPU IN THE ERA OF CPU-GPU INTEGRATION
Arora, Manish
Nath, Siddhartha
Mazumdar, Subhra
Baden, Scott B.
Tullsen, Dean M.
[J]. IEEE MICRO, 2012, 32 (06) : 4 - 16
[8] A comparison of Algebraic Multigrid Bidomain solvers on hybrid CPU-GPU architectures
Centofanti, Edoardo
Scacchi, Simone
[J]. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2024, 423
[9] Speeding up Planning in Multiagent Settings Using CPU-GPU Architectures
Adoe, Fadel
Chen, Yingke
Doshi, Prashant
[J]. AGENTS AND ARTIFICIAL INTELLIGENCE, ICAART 2015, 2015, 9494 : 262 - 283
[10] Optimizing B+-Tree Searches on Coupled CPU-GPU Architectures
Huang, Han
Luan, Hua
[J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT I, 2020, 12452 : 401 - 415

← 1 2 3 4 5 →