Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs

被引:2
|
作者
Cherian, Aaron Thomas [1 ]
Zhou, Keren [1 ]
Grubisic, Dejan [1 ]
Meng, Xiaozhu [1 ]
Mellor-Crummey, John [1 ]
机构
[1] Rice Univ, Dept Comp Sci, Houston, TX 77251 USA
关键词
Supercomputers; High performance computing; Performance analysis; Parallel programming;
D O I
10.1109/ProTools54808.2021.00009
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Graphics Processing Units (GPUs) have become a key technology for accelerating node performance in supercomputers, including the US Department of Energy's forthcoming exascale systems. Since the execution model for GPUs differs from that for conventional processors, applications need to be rewritten to exploit GPU parallelism. Performance tools are needed for such GPU-accelerated systems to help developers assess how well applications offload computation onto GPUs. In this paper, we describe extensions to Rice University's HPCToolkit performance tools that support measurement and analysis of Intel's DPC++ programming model for GPU-accelerated systems atop an implementation of the industry-standard OpenCL framework for heterogeneous parallelism on Intel GPUs. HPCToolkit supports three techniques for performance analysis of programs atop OpenCL on Intel GPUs. First, HPCToolkit supports profiling and tracing of OpenCL kernels. Second, HPCToolkit supports CPU-GPU blame shifting for OpenCL kernel executions-a profiling technique that can identify code that executes on one or more CPUs while GPUs are idle. Third, HPCToolkit supports fine-grained measurement, analysis, and attribution of performance metrics to OpenCL GPU kernels, including instruction counts, execution latency, and SIMD waste. The paper describes these capabilities and then illustrates their application in case studies with two applications that offload computations onto Intel GPUs.
引用
收藏
页码:26 / 35
页数:10
相关论文
共 50 条
  • [31] GPU-Accelerated Mahalanobis-Average Hierarchical Clustering Analysis
    Smelko, Adam
    Kratochvil, Miroslav
    Krulis, Martin
    Sieger, Tomas
    EURO-PAR 2021: PARALLEL PROCESSING, 2021, 12820 : 580 - 595
  • [32] Consistently GPU-Accelerated Graph Visualization
    Panagiotidis, Alexandros
    Reina, Guido
    Burch, Michael
    Pfannkuch, Tilo
    Ertl, Thomas
    8TH INTERNATIONAL SYMPOSIUM ON VISUAL INFORMATION COMMUNICATION AND INTERACTION (VINCI 2015), 2015, : 35 - 41
  • [33] GPU-Accelerated Algorithm for Polygon Reconstruction
    Ji, Ruian
    Niu, Zhirui
    Chen, Lan
    APPLIED SCIENCES-BASEL, 2025, 15 (03):
  • [34] GPU-accelerated computation of electron transfer
    Hoefinger, Siegfried
    Acocella, Angela
    Pop, Sergiu C.
    Narumi, Tetsu
    Yasuoka, Kenji
    Beu, Titus
    Zerbetto, Francesco
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2012, 33 (29) : 2351 - 2356
  • [35] GPU-accelerated model for fast, three-dimensional fluid-structure interaction computations
    Nita, Cosmin
    Itu, Lucian
    Mihalef, Viorel
    Sharma, Puneet
    Rapaka, Saikiran
    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 965 - 968
  • [36] GPU-accelerated and pipelined methylation calling
    Feng, Yilin
    Akbulut, Gulsum Gudukbay
    Tang, Xulong
    Gunasekaran, Jashwant Raj
    Rahman, Amatur
    Medvedev, Paul
    Kandemir, Mahmut
    BIOINFORMATICS ADVANCES, 2022, 2 (01):
  • [37] Challenges in GPU-Accelerated Nonlinear Dynamic Analysis for Structural Systems
    Simpson, Barbara G.
    Zhu, Minjie
    Seki, Akiri
    Scott, Michael
    JOURNAL OF STRUCTURAL ENGINEERING, 2023, 149 (03)
  • [38] A GPU-Accelerated Framework for Path-Based Timing Analysis
    Guo, Guannan
    Huang, Tsung-Wei
    Lin, Yibo
    Guo, Zizheng
    Yellapragada, Sushma
    Wong, Martin D. F.
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (11) : 4219 - 4232
  • [39] A Tool for Bottleneck Analysis and Performance Prediction for GPU-accelerated Applications
    Madougou, Souley
    Varbanescu, Ana Lucia
    de Laat, Cees
    van Nieuwpoort, Rob
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 641 - 652
  • [40] An Effective Matrix Compression Method for GPU-Accelerated Thermal Analysis
    Chiou, Lih-Yih
    Lu, Liang-Ying
    Lin, Chieh-Yu
    2015 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), 2015,