Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs

被引：2

作者：

Cherian, Aaron Thomas ^{[1
]}

Zhou, Keren ^{[1
]}

Grubisic, Dejan ^{[1
]}

Meng, Xiaozhu ^{[1
]}

Mellor-Crummey, John ^{[1
]}

机构：

[1] Rice Univ, Dept Comp Sci, Houston, TX 77251 USA

来源：

PROCEEDINGS OF WORKSHOP ON PROGRAMMING AND PERFORMANCE VISUALIZATION TOOLS (PROTOOLS 2021) | 2021年

关键词：

Supercomputers; High performance computing; Performance analysis; Parallel programming;

D O I：

10.1109/ProTools54808.2021.00009

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Graphics Processing Units (GPUs) have become a key technology for accelerating node performance in supercomputers, including the US Department of Energy's forthcoming exascale systems. Since the execution model for GPUs differs from that for conventional processors, applications need to be rewritten to exploit GPU parallelism. Performance tools are needed for such GPU-accelerated systems to help developers assess how well applications offload computation onto GPUs. In this paper, we describe extensions to Rice University's HPCToolkit performance tools that support measurement and analysis of Intel's DPC++ programming model for GPU-accelerated systems atop an implementation of the industry-standard OpenCL framework for heterogeneous parallelism on Intel GPUs. HPCToolkit supports three techniques for performance analysis of programs atop OpenCL on Intel GPUs. First, HPCToolkit supports profiling and tracing of OpenCL kernels. Second, HPCToolkit supports CPU-GPU blame shifting for OpenCL kernel executions-a profiling technique that can identify code that executes on one or more CPUs while GPUs are idle. Third, HPCToolkit supports fine-grained measurement, analysis, and attribution of performance metrics to OpenCL GPU kernels, including instruction counts, execution latency, and SIMD waste. The paper describes these capabilities and then illustrates their application in case studies with two applications that offload computations onto Intel GPUs.

引用

页码：26 / 35

页数：10

共 50 条

[31] GPU-Accelerated Mahalanobis-Average Hierarchical Clustering Analysis
Smelko, Adam
Kratochvil, Miroslav
Krulis, Martin
Sieger, Tomas
EURO-PAR 2021: PARALLEL PROCESSING, 2021, 12820 : 580 - 595
[32] Consistently GPU-Accelerated Graph Visualization
Panagiotidis, Alexandros
Reina, Guido
Burch, Michael
Pfannkuch, Tilo
Ertl, Thomas
8TH INTERNATIONAL SYMPOSIUM ON VISUAL INFORMATION COMMUNICATION AND INTERACTION (VINCI 2015), 2015, : 35 - 41
[33] GPU-Accelerated Algorithm for Polygon Reconstruction
Ji, Ruian
Niu, Zhirui
Chen, Lan
APPLIED SCIENCES-BASEL, 2025, 15 (03):
[34] GPU-accelerated computation of electron transfer
Hoefinger, Siegfried
Acocella, Angela
Pop, Sergiu C.
Narumi, Tetsu
Yasuoka, Kenji
Beu, Titus
Zerbetto, Francesco
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2012, 33 (29) : 2351 - 2356
[35] GPU-accelerated model for fast, three-dimensional fluid-structure interaction computations
Nita, Cosmin
Itu, Lucian
Mihalef, Viorel
Sharma, Puneet
Rapaka, Saikiran
2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 965 - 968
[36] GPU-accelerated and pipelined methylation calling
Feng, Yilin
Akbulut, Gulsum Gudukbay
Tang, Xulong
Gunasekaran, Jashwant Raj
Rahman, Amatur
Medvedev, Paul
Kandemir, Mahmut
BIOINFORMATICS ADVANCES, 2022, 2 (01):
[37] Challenges in GPU-Accelerated Nonlinear Dynamic Analysis for Structural Systems
Simpson, Barbara G.
Zhu, Minjie
Seki, Akiri
Scott, Michael
JOURNAL OF STRUCTURAL ENGINEERING, 2023, 149 (03)
[38] A GPU-Accelerated Framework for Path-Based Timing Analysis
Guo, Guannan
Huang, Tsung-Wei
Lin, Yibo
Guo, Zizheng
Yellapragada, Sushma
Wong, Martin D. F.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (11) : 4219 - 4232
[39] A Tool for Bottleneck Analysis and Performance Prediction for GPU-accelerated Applications
Madougou, Souley
Varbanescu, Ana Lucia
de Laat, Cees
van Nieuwpoort, Rob
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 641 - 652
[40] An Effective Matrix Compression Method for GPU-Accelerated Thermal Analysis
Chiou, Lih-Yih
Lu, Liang-Ying
Lin, Chieh-Yu
2015 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), 2015,

← 1 2 3 4 5 →