Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs

被引：2

作者：

Cherian, Aaron Thomas ^{[1
]}

Zhou, Keren ^{[1
]}

Grubisic, Dejan ^{[1
]}

Meng, Xiaozhu ^{[1
]}

Mellor-Crummey, John ^{[1
]}

机构：

[1] Rice Univ, Dept Comp Sci, Houston, TX 77251 USA

来源：

PROCEEDINGS OF WORKSHOP ON PROGRAMMING AND PERFORMANCE VISUALIZATION TOOLS (PROTOOLS 2021) | 2021年

关键词：

Supercomputers; High performance computing; Performance analysis; Parallel programming;

D O I：

10.1109/ProTools54808.2021.00009

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Graphics Processing Units (GPUs) have become a key technology for accelerating node performance in supercomputers, including the US Department of Energy's forthcoming exascale systems. Since the execution model for GPUs differs from that for conventional processors, applications need to be rewritten to exploit GPU parallelism. Performance tools are needed for such GPU-accelerated systems to help developers assess how well applications offload computation onto GPUs. In this paper, we describe extensions to Rice University's HPCToolkit performance tools that support measurement and analysis of Intel's DPC++ programming model for GPU-accelerated systems atop an implementation of the industry-standard OpenCL framework for heterogeneous parallelism on Intel GPUs. HPCToolkit supports three techniques for performance analysis of programs atop OpenCL on Intel GPUs. First, HPCToolkit supports profiling and tracing of OpenCL kernels. Second, HPCToolkit supports CPU-GPU blame shifting for OpenCL kernel executions-a profiling technique that can identify code that executes on one or more CPUs while GPUs are idle. Third, HPCToolkit supports fine-grained measurement, analysis, and attribution of performance metrics to OpenCL GPU kernels, including instruction counts, execution latency, and SIMD waste. The paper describes these capabilities and then illustrates their application in case studies with two applications that offload computations onto Intel GPUs.

引用

页码：26 / 35

页数：10

共 50 条

[41] A GPU-accelerated viewer for HEALPix maps
Frolov, A., V
ASTRONOMY AND COMPUTING, 2023, 45
[42] A GPU-Accelerated Barycentric Lagrange Treecode
Vaughn, Nathan
Wilson, Leighton
Krasny, Robert
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 701 - 710
[43] Porting WarpX to GPU-accelerated platforms
Myers, A.
Almgren, A.
Amorim, L. D.
Bell, J.
Fedeli, L.
Ge, L.
Gott, K.
Grote, D. P.
Hogan, M.
Huebl, A.
Jambunathan, R.
Lehe, R.
Ng, C.
Rowan, M.
Shapoval, O.
Thevenet, M.
Vay, J-L
Vincenti, H.
Yang, E.
Zaim, N.
Zhang, W.
Zhao, Y.
Zoni, E.
PARALLEL COMPUTING, 2021, 108
[44] A GPU-accelerated image reduction pipeline
Niwano, Masafumi
Murata, Katsuhiro L.
Adachi, Ryo
Wang, Sili
Tachibana, Yutaro
Yatsu, Yoichi
Kawai, Nobuyuki
Shimokawabe, Takashi
Itoh, Ryosuke
PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF JAPAN, 2021, 73 (01) : 14 - 24
[45] GPU-accelerated transportation simplex algorithm
Mahajan, Mohit
Nagi, Rakesh
Journal of Parallel and Distributed Computing, 2024, 184
[46] Practical considerations for GPU-accelerated CT
Mueller, Klaus
Xu, Fang
2006 3RD IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING: MACRO TO NANO, VOLS 1-3, 2006, : 1184 - +
[47] GAMER: GPU-Accelerated Maze Routing
Lin, Shiju
Liu, Jinwei
Young, Evangeline F. Y.
Wong, Martin D. F.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (02) : 583 - 593
[48] GPU-accelerated transportation simplex algorithm
Mahajan, Mohit
Nagi, Rakesh
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2024, 184
[49] GPU-accelerated adjoint algorithmic differentiation
Gremse, Felix
Hoefter, Andreas
Razik, Lukas
Kiessling, Fabian
Naumann, Uwe
COMPUTER PHYSICS COMMUNICATIONS, 2016, 200 : 300 - 311
[50] GPU-accelerated DEM implementation with CUDA
Qi, Ji
Li, Kuan-Ching
Jiang, Hai
Zhou, Qingguo
Yang, Lei
INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2015, 11 (03) : 330 - 337

← 1 2 3 4 5 →