Microarchitectural Performance Characterization of Irregular GPU Kernels

被引:0
|
作者
O'Neil, Molly A. [1 ]
Burtscher, Martin [1 ]
机构
[1] Texas State Univ, Dept Comp Sci, San Marcos, TX 78666 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular memory access patterns and control flow. However, relatively little is known about the behavior of irregular GPU codes, and there has been minimal effort to quantify the ways in which they differ from regular GPGPU applications. We examine the behavior of a suite of optimized irregular CUDA applications on a cycle-accurate GPU simulator. We characterize the performance bottlenecks in each program and connect source code with microarchitectural characteristics. We also assess the impact of improvements in cache and DRAM bandwidth and latency and discuss the implications for GPU architecture design. We find that, while irregular graph codes exhibit significantly more underutilized execution cycles due to branch divergence, load imbalance, and synchronization overhead than regular programs, these factors contribute less to performance degradation than we expected. It appears that code optimizations are often able to effectively address these performance hurdles. Insufficient bandwidth and long memory latency are the biggest limiters of performance. Surprisingly, we find that applications with irregular memory access patterns are more sensitive to changes in L2 latency and bandwidth than DRAM latency and bandwidth.
引用
收藏
页码:130 / 139
页数:10
相关论文
共 50 条
  • [1] Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization
    Abdelfattah, Ahmad
    Haidar, Azzam
    Tomov, Stanimire
    Dongarra, Jack
    2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
  • [2] The GPU on irregular computing: Performance issues and contributions
    Ujaldon, M
    Saltz, J
    NINTH INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN AND COMPUTER GRAPHICS, PROCEEDINGS, 2005, : 442 - 448
  • [3] Utilizing GPU Performance Counters to Characterize GPU Kernels via Machine Learning
    Zigon, Bob
    Song, Fengguang
    COMPUTATIONAL SCIENCE - ICCS 2020, PT I, 2020, 12137 : 88 - 101
  • [4] Performance Prediction and Ranking of SpMV Kernels on GPU Architectures
    Lehnert, Christoph
    Berrendorf, Rudolf
    Ecker, Jan P.
    Mannuss, Florian
    EURO-PAR 2016: PARALLEL PROCESSING, 2016, 9833 : 90 - 102
  • [5] GPUrdma: GPU-side library for high performance networking from GPU kernels
    Daoud, Feras
    Watad, Amir
    Silberstein, Mark
    PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON RUNTIME AND OPERATING SYSTEMS FOR SUPERCOMPUTERS, (ROSS 2016), 2016,
  • [6] CPU Microarchitectural Performance Characterization of Cloud Video Transcoding
    Chen, Yuhan
    Zhu, Jingyuan
    Khan, Tanvir Ahmed
    Kasikci, Baris
    2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), 2020, : 72 - 82
  • [7] Performance of CPU/GPU compiler directives on ISO/TTI kernels
    Sayan Ghosh
    Terrence Liao
    Henri Calandra
    Barbara M. Chapman
    Computing, 2014, 96 : 1149 - 1162
  • [8] Empirical performance modeling of GPU kernels using active learning
    Balaprakash, Prasanna
    Rupp, Karl
    Mametjanov, Azamat
    Gramacy, Robert B.
    Hovland, Paul D.
    Wild, Stefan M.
    PARALLEL COMPUTING: ACCELERATING COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, 25 : 646 - 655
  • [9] Characterizing Performance and Power towards Efficient Synchronization of GPU Kernels
    Harb, Islam
    Feng, Wu-Chun
    2016 IEEE 24TH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS), 2016, : 451 - 456
  • [10] A Performance Prediction Model for Memory-intensive GPU Kernels
    Hu, Zhidan
    Liu, Guangming
    Hu, Zhidan
    2014 IEEE SYMPOSIUM ON COMPUTER APPLICATIONS AND COMMUNICATIONS (SCAC), 2014, : 14 - 18