A Performance Prediction Model for Memory-intensive GPU Kernels

被引:3
|
作者
Hu, Zhidan [1 ]
Liu, Guangming [1 ]
Hu, Zhidan [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Hunan, Peoples R China
关键词
GPU; CUDA; performance prediction; memory-intensive;
D O I
10.1109/SCAC.2014.10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Commodity graphic processing units (GPUs) have rapidly evolved to become high performance accelerators for data-parallel computing through a large array of processing cores and the CUDA programming model with a C-like interface. However, optimizing an application for maximum performance based on the GPU architecture is not a trivial task for the tremendous change from conventional multi-core to the many-core architectures. Besides, the GPU vendors do not disclose much detail about the characteristics of the GPU's architecture. To provide insights into the performance of memory-intensive kernels, we propose a pipelined global memory model to incorporate the most critical global memory performance related factor, uncoalesced memory access pattern, and provide a basis for predicting performance of memory-intensive kernels. As we will demonstrate, the pipeline throughput is dynamic and sensitive to the memory access patterns. We validated our model on the NVIDIA GPUs using CUDA (Compute Unified Device Architecture). The experiment results show that the pipeline captures performance factors related to global memory and is able to estimate the performance for memory-intensive GPU kernels via the proposed model.
引用
收藏
页码:14 / 18
页数:5
相关论文
共 50 条
  • [21] Applying Eco-Threading Framework to Memory-Intensive Hadoop Applications
    Takasaki, Hiroaki
    Mostafa, Samih M.
    Kusakabe, Shigeru
    2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA), 2014,
  • [22] Architectural Challenges in Memory-Intensive, Real-Time Image Forming
    Ahlander, A.
    Hellsten, H.
    Lind, K.
    Lindgren, J.
    Svensson, B.
    2007 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP), 2007, : 291 - +
  • [23] Application-driven synthesis of memory-intensive systems-on-chip
    Kirovski, D
    Lee, C
    Potkonjak, M
    Mangione-Smith, WH
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1999, 18 (09) : 1316 - 1326
  • [24] Enabling the CUDA Unified Memory model in Edge, Cloud and HPC offloaded GPU kernels
    Montella, Raffaele
    Di Luccio, Diana
    De Vita, Ciro Giuseppe
    Mellone, Gennaro
    Lapegna, Marco
    Laccetti, Giuliano
    Kosta, Sokol
    Giunta, Giulio
    2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 834 - 841
  • [25] A Platform for High Level Synthesis of Memory-Intensive Image Processing Algorithms
    Papenfuss, Tim
    Michel, Holger
    FPGA 11: PROCEEDINGS OF THE 2011 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, 2011, : 75 - 78
  • [26] BAMBU: A MODULAR FRAMEWORK FOR THE HIGH LEVEL SYNTHESIS OF MEMORY-INTENSIVE APPLICATIONS
    Pilato, Christian
    Ferrandi, Fabrizio
    2013 23RD INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2013) PROCEEDINGS, 2013,
  • [27] Comparison of the performance of various kernels for the survival prediction model
    Lee, Seungyeoun
    Kim, Nayeon
    Kim, Beomseok
    Kim, Inyoung
    COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2024, 31 (06) : 703 - 708
  • [28] Co-mining: A Processing-in-Memory Assisted Framework for Memory-Intensive PoW Acceleration
    Wang, Tianyu
    Shen, Zhaoyan
    Shao, Zili
    PROCEEDINGS OF THE 23RD ACM SIGPLAN/SIGBED INTERNATIONAL CONFERENCE ON LANGUAGES, COMPILERS, AND TOOLS FOR EMBEDDED SYSTEMS, LCTES 2022, 2022, : 1 - 12
  • [29] A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels
    Braun, Lorenz
    Nikas, Sotirios
    Song, Chen
    Heuveline, Vincent
    Froening, Holger
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2021, 18 (01)
  • [30] Microarchitectural Performance Characterization of Irregular GPU Kernels
    O'Neil, Molly A.
    Burtscher, Martin
    2014 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2014, : 130 - 139