A Performance Prediction Model for Memory-intensive GPU Kernels

被引:3
|
作者
Hu, Zhidan [1 ]
Liu, Guangming [1 ]
Hu, Zhidan [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Hunan, Peoples R China
关键词
GPU; CUDA; performance prediction; memory-intensive;
D O I
10.1109/SCAC.2014.10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Commodity graphic processing units (GPUs) have rapidly evolved to become high performance accelerators for data-parallel computing through a large array of processing cores and the CUDA programming model with a C-like interface. However, optimizing an application for maximum performance based on the GPU architecture is not a trivial task for the tremendous change from conventional multi-core to the many-core architectures. Besides, the GPU vendors do not disclose much detail about the characteristics of the GPU's architecture. To provide insights into the performance of memory-intensive kernels, we propose a pipelined global memory model to incorporate the most critical global memory performance related factor, uncoalesced memory access pattern, and provide a basis for predicting performance of memory-intensive kernels. As we will demonstrate, the pipeline throughput is dynamic and sensitive to the memory access patterns. We validated our model on the NVIDIA GPUs using CUDA (Compute Unified Device Architecture). The experiment results show that the pipeline captures performance factors related to global memory and is able to estimate the performance for memory-intensive GPU kernels via the proposed model.
引用
收藏
页码:14 / 18
页数:5
相关论文
共 50 条
  • [1] PERI - Auto-tuning memory-Intensive kernels for multicore
    Williams, Samuel
    Datta, Kaushik
    Carter, Jonathan
    Oliker, Leonid
    Shalf, John
    Yelick, Katherine
    Bailey, David
    SCIDAC 2008: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2008, 125
  • [2] A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels
    Suh, J
    Kim, EG
    Crago, SP
    Srinivasan, L
    French, MC
    30TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 2003, : 410 - 419
  • [3] Power and Performance Evaluation of Memory-Intensive Applications
    Zhang, Kaiqiang
    Ou, Dongyang
    Jiang, Congfeng
    Qiu, Yeliang
    Yan, Longchuan
    ENERGIES, 2021, 14 (14)
  • [4] A practical performance model for compute and memory bound GPU kernels
    Konstantinidis, Elias
    Cotronis, Yiannis
    23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 651 - 658
  • [5] Constructive Synthesis of Memory-Intensive Accelerators for FPGA From Nested Loop Kernels
    Milford, Matthew
    McAllister, John
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2016, 64 (16) : 4152 - 4165
  • [6] Performance Prediction and Ranking of SpMV Kernels on GPU Architectures
    Lehnert, Christoph
    Berrendorf, Rudolf
    Ecker, Jan P.
    Mannuss, Florian
    EURO-PAR 2016: PARALLEL PROCESSING, 2016, 9833 : 90 - 102
  • [7] Optimization and Deployment of Memory-Intensive Operations in Deep Learning Model on Edge
    Peng XU
    Jianxin ZHAO
    Chi Harold LIU
    计算机科学, 2023, 50 (02) : 3 - 12
  • [8] NOISY SORT, A MEMORY-INTENSIVE SORTING ALGORITHM
    HORIGUCHI, S
    MIRANKER, WL
    LINEAR ALGEBRA AND ITS APPLICATIONS, 1989, 114 : 641 - 658
  • [9] Quantifying the Impact of Dynamic Memory Managers into Memory-Intensive Applications
    Diaz, Josefa
    Colmenar, J. Manuel
    Risco-Martin, Jose L.
    Ayala, Jose L.
    Garnica, Oscar
    PROCEEDINGS OF THE 2011 SUMMER COMPUTER SIMULATION CONFERENCE, 2011, : 160 - 167
  • [10] Mammoth: Gearing Hadoop Towards Memory-Intensive MapReduce Applications
    Shi, Xuanhua
    Chen, Ming
    He, Ligang
    Xie, Xu
    Lu, Lu
    Jin, Hai
    Chen, Yong
    Wu, Song
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (08) : 2300 - 2315