A tool for top-down performance analysis of GPU-accelerated applications

被引:6
|
作者
Zhou, Keren [1 ]
Krentel, Mark [1 ]
Mellor-Crummey, John [1 ]
机构
[1] Rice Univ, Dept Comp Sci, Houston, TX 77251 USA
关键词
GPU; Profiler; Wait-free data structure; Calling context tree; HPC; Roofline;
D O I
10.1145/3332466.3374534
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To support performance measurement and analysis of GPU-accelerated applications, we extended the HPCToolkit performance tools with several novel features. To support efficient monitoring of accelerated applications, HPCToolkit employs a new wait-free data structure to coordinate measurement and attribution between each application thread and a GPU monitor thread. To help developers understand the performance of accelerated applications, HPCToolkit attributes metrics to heterogeneous calling contexts that span both CPUs and GPUs. To support fine-grain analysis and tuning of GPU-accelerated code, HPCToolkit collects PC samples of both CPU and GPU activity to derive and attribute metrics at all levels in a heterogeneous calling context.
引用
收藏
页码:415 / 416
页数:2
相关论文
共 50 条
  • [41] Performance Comparison of GPU-Accelerated Particle Flow and Particle Filters
    Jilkov, Vesselin P.
    Wu, Jiande
    Chen, Huimin
    2013 16TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2013, : 1095 - 1102
  • [42] GPU-accelerated performance on numerically solving multimode Schrodinger equation
    Gong Si-yu
    Zhang Jian-yong
    SIXTH SYMPOSIUM ON NOVEL OPTOELECTRONIC DETECTION TECHNOLOGY AND APPLICATIONS, 2020, 11455
  • [43] Performance comparison of GPU-accelerated fast motion estimation method
    Chen, Pengcheng
    Peng, Bo
    Zou, Anxin
    Xu, Luwen
    2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 660 - 665
  • [44] NOISE-ANALYSIS TOOL SUPPORTS TOP-DOWN DESIGN METHODOLOGY
    DONLIN, M
    COMPUTER DESIGN, 1994, 33 (01): : 115 - 115
  • [45] A GPU-Accelerated Fast Multipole Method for GROMACS: Performance and Accuracy
    Kohnke, Bartosz
    Kutzner, Carsten
    Grubmueller, Helmut
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2020, 16 (11) : 6938 - 6949
  • [46] Top-Down Garbage Collector: a tool for selecting high-quality top-down proteomics mass spectra
    Lima, Diogo B.
    Silva, Andre R. F.
    Dupre, Mathieu
    Santos, Marlon D. M.
    Clasen, Milan A.
    Kurt, Louise U.
    Aquino, Priscila F.
    Barbosa, Valmir C.
    Carvalho, Paulo C.
    Chamot-Rooke, Julia
    BIOINFORMATICS, 2019, 35 (18) : 3489 - 3490
  • [47] Top-down proteomics for the analysis of proteolytic events - Methods, applications and perspectives
    Tholey, Andreas
    Becker, Alexander
    BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH, 2017, 1864 (11): : 2191 - 2199
  • [48] Efficient MPI-based Communication for GPU-Accelerated Dask Applications
    Shafi, Aamir
    Hashmi, Jahanzeb Maqbool
    Subramoni, Hari
    Panda, Dhabaleswar K.
    21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 277 - 286
  • [49] Applications of GPU-accelerated replica exchange molecular dynamic simulations of proteins
    Wang, Kai
    Shirts, Michael R.
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2012, 244
  • [50] GPU-Accelerated Progressive Gaussian Filtering with Applications to Extended Object Tracking
    Steinbring, Jannik
    Hanebeck, Uwe D.
    2015 18TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2015, : 1038 - 1045