A Performance Prediction Model for Memory-intensive GPU Kernels

被引:3
|
作者
Hu, Zhidan [1 ]
Liu, Guangming [1 ]
Hu, Zhidan [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Hunan, Peoples R China
关键词
GPU; CUDA; performance prediction; memory-intensive;
D O I
10.1109/SCAC.2014.10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Commodity graphic processing units (GPUs) have rapidly evolved to become high performance accelerators for data-parallel computing through a large array of processing cores and the CUDA programming model with a C-like interface. However, optimizing an application for maximum performance based on the GPU architecture is not a trivial task for the tremendous change from conventional multi-core to the many-core architectures. Besides, the GPU vendors do not disclose much detail about the characteristics of the GPU's architecture. To provide insights into the performance of memory-intensive kernels, we propose a pipelined global memory model to incorporate the most critical global memory performance related factor, uncoalesced memory access pattern, and provide a basis for predicting performance of memory-intensive kernels. As we will demonstrate, the pipeline throughput is dynamic and sensitive to the memory access patterns. We validated our model on the NVIDIA GPUs using CUDA (Compute Unified Device Architecture). The experiment results show that the pipeline captures performance factors related to global memory and is able to estimate the performance for memory-intensive GPU kernels via the proposed model.
引用
收藏
页码:14 / 18
页数:5
相关论文
共 50 条
  • [41] A FEEDBACK CONTROL MECHANISM FOR BALANCING I/O- AND MEMORY-INTENSIVE APPLICATIONS ON CLUSTERS
    Qin, Xiao
    Jiang, Hong
    Zhu, Yifeng
    Swanson, David R.
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2005, 6 (04): : 95 - 107
  • [42] An associative capacitive network based on nanoscale complementary resistive switches for memory-intensive computing
    Kavehei, Omid
    Linn, Eike
    Nielen, Lutz
    Tappertzhofen, Stefan
    Skafidas, Efstratios
    Valov, Ilia
    Waser, Rainer
    NANOSCALE, 2013, 5 (11) : 5119 - 5128
  • [43] MAINFRAME IMAGE-PROCESSING SPEED ACHIEVED IN PERSONAL COMPUTERS WITH MEMORY-INTENSIVE ALGORITHMS
    PRATT, JP
    LEAR, JL
    RADIOLOGY, 1992, 185 : 251 - 251
  • [44] ONE-CHIP MICROCOMPUTER EXCELS IN I/O AND MEMORY-INTENSIVE USES.
    Peuto, Bernard L.
    Prosenko, Gary J.
    Estrin, Judy
    Bass, Charles
    Electronics, 1978, 51 (18): : 128 - 137
  • [45] ON-DEMAND-FORK: A Microsecond Fork for Memory-Intensive and Latency-Sensitive Applications
    Zhao, Kaiyang
    Gong, Sishuai
    Fonseca, Pedro
    PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, : 540 - 555
  • [46] An energy-efficient scheduling approach for memory-intensive tasks in multi-core systems
    Maurya A.K.
    Meena A.
    Singh D.
    Kumar V.
    International Journal of Information Technology, 2022, 14 (6) : 2793 - 2801
  • [47] Analytic performance model for parallel overlapping memory-bound kernels
    Afzal, Ayesha
    Hager, Georg
    Wellein, Gerhard
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (10):
  • [48] Optimized On-Chip-Pipelining for Memory-Intensive Computations on Multi-Core Processors with Explicit Memory Hierarchy
    Keller, Joerg
    Kessler, Christoph W.
    Hulten, Rikard
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2012, 18 (14) : 1987 - 2023
  • [49] Analyzing data locality in GPU kernels using memory footprint analysis
    Kiani, Mohsen
    Rajabzadeh, Amir
    SIMULATION MODELLING PRACTICE AND THEORY, 2019, 91 : 102 - 122
  • [50] GPUrdma: GPU-side library for high performance networking from GPU kernels
    Daoud, Feras
    Watad, Amir
    Silberstein, Mark
    PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON RUNTIME AND OPERATING SYSTEMS FOR SUPERCOMPUTERS, (ROSS 2016), 2016,