Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs

被引:2
|
作者
Cherian, Aaron Thomas [1 ]
Zhou, Keren [1 ]
Grubisic, Dejan [1 ]
Meng, Xiaozhu [1 ]
Mellor-Crummey, John [1 ]
机构
[1] Rice Univ, Dept Comp Sci, Houston, TX 77251 USA
关键词
Supercomputers; High performance computing; Performance analysis; Parallel programming;
D O I
10.1109/ProTools54808.2021.00009
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Graphics Processing Units (GPUs) have become a key technology for accelerating node performance in supercomputers, including the US Department of Energy's forthcoming exascale systems. Since the execution model for GPUs differs from that for conventional processors, applications need to be rewritten to exploit GPU parallelism. Performance tools are needed for such GPU-accelerated systems to help developers assess how well applications offload computation onto GPUs. In this paper, we describe extensions to Rice University's HPCToolkit performance tools that support measurement and analysis of Intel's DPC++ programming model for GPU-accelerated systems atop an implementation of the industry-standard OpenCL framework for heterogeneous parallelism on Intel GPUs. HPCToolkit supports three techniques for performance analysis of programs atop OpenCL on Intel GPUs. First, HPCToolkit supports profiling and tracing of OpenCL kernels. Second, HPCToolkit supports CPU-GPU blame shifting for OpenCL kernel executions-a profiling technique that can identify code that executes on one or more CPUs while GPUs are idle. Third, HPCToolkit supports fine-grained measurement, analysis, and attribution of performance metrics to OpenCL GPU kernels, including instruction counts, execution latency, and SIMD waste. The paper describes these capabilities and then illustrates their application in case studies with two applications that offload computations onto Intel GPUs.
引用
收藏
页码:26 / 35
页数:10
相关论文
共 50 条
  • [41] A GPU-accelerated viewer for HEALPix maps
    Frolov, A., V
    ASTRONOMY AND COMPUTING, 2023, 45
  • [42] A GPU-Accelerated Barycentric Lagrange Treecode
    Vaughn, Nathan
    Wilson, Leighton
    Krasny, Robert
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 701 - 710
  • [43] Porting WarpX to GPU-accelerated platforms
    Myers, A.
    Almgren, A.
    Amorim, L. D.
    Bell, J.
    Fedeli, L.
    Ge, L.
    Gott, K.
    Grote, D. P.
    Hogan, M.
    Huebl, A.
    Jambunathan, R.
    Lehe, R.
    Ng, C.
    Rowan, M.
    Shapoval, O.
    Thevenet, M.
    Vay, J-L
    Vincenti, H.
    Yang, E.
    Zaim, N.
    Zhang, W.
    Zhao, Y.
    Zoni, E.
    PARALLEL COMPUTING, 2021, 108
  • [44] A GPU-accelerated image reduction pipeline
    Niwano, Masafumi
    Murata, Katsuhiro L.
    Adachi, Ryo
    Wang, Sili
    Tachibana, Yutaro
    Yatsu, Yoichi
    Kawai, Nobuyuki
    Shimokawabe, Takashi
    Itoh, Ryosuke
    PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF JAPAN, 2021, 73 (01) : 14 - 24
  • [45] GPU-accelerated transportation simplex algorithm
    Mahajan, Mohit
    Nagi, Rakesh
    Journal of Parallel and Distributed Computing, 2024, 184
  • [46] Practical considerations for GPU-accelerated CT
    Mueller, Klaus
    Xu, Fang
    2006 3RD IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING: MACRO TO NANO, VOLS 1-3, 2006, : 1184 - +
  • [47] GAMER: GPU-Accelerated Maze Routing
    Lin, Shiju
    Liu, Jinwei
    Young, Evangeline F. Y.
    Wong, Martin D. F.
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (02) : 583 - 593
  • [48] GPU-accelerated transportation simplex algorithm
    Mahajan, Mohit
    Nagi, Rakesh
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2024, 184
  • [49] GPU-accelerated adjoint algorithmic differentiation
    Gremse, Felix
    Hoefter, Andreas
    Razik, Lukas
    Kiessling, Fabian
    Naumann, Uwe
    COMPUTER PHYSICS COMMUNICATIONS, 2016, 200 : 300 - 311
  • [50] GPU-accelerated DEM implementation with CUDA
    Qi, Ji
    Li, Kuan-Ching
    Jiang, Hai
    Zhou, Qingguo
    Yang, Lei
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2015, 11 (03) : 330 - 337