Microarchitectural Performance Characterization of Irregular GPU Kernels

被引：0

作者：

O'Neil, Molly A. ^{[1
]}

Burtscher, Martin ^{[1
]}

机构：

[1] Texas State Univ, Dept Comp Sci, San Marcos, TX 78666 USA

来源：

2014 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC) | 2014年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular memory access patterns and control flow. However, relatively little is known about the behavior of irregular GPU codes, and there has been minimal effort to quantify the ways in which they differ from regular GPGPU applications. We examine the behavior of a suite of optimized irregular CUDA applications on a cycle-accurate GPU simulator. We characterize the performance bottlenecks in each program and connect source code with microarchitectural characteristics. We also assess the impact of improvements in cache and DRAM bandwidth and latency and discuss the implications for GPU architecture design. We find that, while irregular graph codes exhibit significantly more underutilized execution cycles due to branch divergence, load imbalance, and synchronization overhead than regular programs, these factors contribute less to performance degradation than we expected. It appears that code optimizations are often able to effectively address these performance hurdles. Insufficient bandwidth and long memory latency are the biggest limiters of performance. Surprisingly, we find that applications with irregular memory access patterns are more sensitive to changes in L2 latency and bandwidth than DRAM latency and bandwidth.

引用

页码：130 / 139

页数：10

共 50 条

[1] Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization
Abdelfattah, Ahmad
Haidar, Azzam
Tomov, Stanimire
Dongarra, Jack
2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
[2] The GPU on irregular computing: Performance issues and contributions
Ujaldon, M
Saltz, J
NINTH INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN AND COMPUTER GRAPHICS, PROCEEDINGS, 2005, : 442 - 448
[3] Utilizing GPU Performance Counters to Characterize GPU Kernels via Machine Learning
Zigon, Bob
Song, Fengguang
COMPUTATIONAL SCIENCE - ICCS 2020, PT I, 2020, 12137 : 88 - 101
[4] Performance Prediction and Ranking of SpMV Kernels on GPU Architectures
Lehnert, Christoph
Berrendorf, Rudolf
Ecker, Jan P.
Mannuss, Florian
EURO-PAR 2016: PARALLEL PROCESSING, 2016, 9833 : 90 - 102
[5] GPUrdma: GPU-side library for high performance networking from GPU kernels
Daoud, Feras
Watad, Amir
Silberstein, Mark
PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON RUNTIME AND OPERATING SYSTEMS FOR SUPERCOMPUTERS, (ROSS 2016), 2016,
[6] CPU Microarchitectural Performance Characterization of Cloud Video Transcoding
Chen, Yuhan
Zhu, Jingyuan
Khan, Tanvir Ahmed
Kasikci, Baris
2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), 2020, : 72 - 82
[7] Performance of CPU/GPU compiler directives on ISO/TTI kernels
Sayan Ghosh
Terrence Liao
Henri Calandra
Barbara M. Chapman
Computing, 2014, 96 : 1149 - 1162
[8] Empirical performance modeling of GPU kernels using active learning
Balaprakash, Prasanna
Rupp, Karl
Mametjanov, Azamat
Gramacy, Robert B.
Hovland, Paul D.
Wild, Stefan M.
PARALLEL COMPUTING: ACCELERATING COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, 25 : 646 - 655
[9] Characterizing Performance and Power towards Efficient Synchronization of GPU Kernels
Harb, Islam
Feng, Wu-Chun
2016 IEEE 24TH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS), 2016, : 451 - 456
[10] A Performance Prediction Model for Memory-intensive GPU Kernels
Hu, Zhidan
Liu, Guangming
Hu, Zhidan
2014 IEEE SYMPOSIUM ON COMPUTER APPLICATIONS AND COMMUNICATIONS (SCAC), 2014, : 14 - 18

← 1 2 3 4 5 →