A Performance Prediction Model for Memory-intensive GPU Kernels

被引：3

作者：

Hu, Zhidan ^{[1
]}

Liu, Guangming ^{[1
]}

Hu, Zhidan ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha, Hunan, Peoples R China

来源：

2014 IEEE SYMPOSIUM ON COMPUTER APPLICATIONS AND COMMUNICATIONS (SCAC) | 2014年

关键词：

GPU; CUDA; performance prediction; memory-intensive;

D O I：

10.1109/SCAC.2014.10

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Commodity graphic processing units (GPUs) have rapidly evolved to become high performance accelerators for data-parallel computing through a large array of processing cores and the CUDA programming model with a C-like interface. However, optimizing an application for maximum performance based on the GPU architecture is not a trivial task for the tremendous change from conventional multi-core to the many-core architectures. Besides, the GPU vendors do not disclose much detail about the characteristics of the GPU's architecture. To provide insights into the performance of memory-intensive kernels, we propose a pipelined global memory model to incorporate the most critical global memory performance related factor, uncoalesced memory access pattern, and provide a basis for predicting performance of memory-intensive kernels. As we will demonstrate, the pipeline throughput is dynamic and sensitive to the memory access patterns. We validated our model on the NVIDIA GPUs using CUDA (Compute Unified Device Architecture). The experiment results show that the pipeline captures performance factors related to global memory and is able to estimate the performance for memory-intensive GPU kernels via the proposed model.

引用

页码：14 / 18

页数：5

共 50 条

[1] PERI - Auto-tuning memory-Intensive kernels for multicore
Williams, Samuel
Datta, Kaushik
Carter, Jonathan
Oliker, Leonid
Shalf, John
Yelick, Katherine
Bailey, David
SCIDAC 2008: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2008, 125
[2] A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels
Suh, J
Kim, EG
Crago, SP
Srinivasan, L
French, MC
30TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 2003, : 410 - 419
[3] Power and Performance Evaluation of Memory-Intensive Applications
Zhang, Kaiqiang
Ou, Dongyang
Jiang, Congfeng
Qiu, Yeliang
Yan, Longchuan
ENERGIES, 2021, 14 (14)
[4] A practical performance model for compute and memory bound GPU kernels
Konstantinidis, Elias
Cotronis, Yiannis
23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 651 - 658
[5] Constructive Synthesis of Memory-Intensive Accelerators for FPGA From Nested Loop Kernels
Milford, Matthew
McAllister, John
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2016, 64 (16) : 4152 - 4165
[6] Performance Prediction and Ranking of SpMV Kernels on GPU Architectures
Lehnert, Christoph
Berrendorf, Rudolf
Ecker, Jan P.
Mannuss, Florian
EURO-PAR 2016: PARALLEL PROCESSING, 2016, 9833 : 90 - 102
[7] Optimization and Deployment of Memory-Intensive Operations in Deep Learning Model on Edge
Peng XU
Jianxin ZHAO
Chi Harold LIU
计算机科学, 2023, 50 (02) : 3 - 12
[8] NOISY SORT, A MEMORY-INTENSIVE SORTING ALGORITHM
HORIGUCHI, S
MIRANKER, WL
LINEAR ALGEBRA AND ITS APPLICATIONS, 1989, 114 : 641 - 658
[9] Quantifying the Impact of Dynamic Memory Managers into Memory-Intensive Applications
Diaz, Josefa
Colmenar, J. Manuel
Risco-Martin, Jose L.
Ayala, Jose L.
Garnica, Oscar
PROCEEDINGS OF THE 2011 SUMMER COMPUTER SIMULATION CONFERENCE, 2011, : 160 - 167
[10] Mammoth: Gearing Hadoop Towards Memory-Intensive MapReduce Applications
Shi, Xuanhua
Chen, Ming
He, Ligang
Xie, Xu
Lu, Lu
Jin, Hai
Chen, Yong
Wu, Song
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (08) : 2300 - 2315

← 1 2 3 4 5 →