A Performance Prediction Model for Memory-intensive GPU Kernels

被引：3

作者：

Hu, Zhidan ^{[1
]}

Liu, Guangming ^{[1
]}

Hu, Zhidan ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha, Hunan, Peoples R China

来源：

2014 IEEE SYMPOSIUM ON COMPUTER APPLICATIONS AND COMMUNICATIONS (SCAC) | 2014年

关键词：

GPU; CUDA; performance prediction; memory-intensive;

D O I：

10.1109/SCAC.2014.10

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Commodity graphic processing units (GPUs) have rapidly evolved to become high performance accelerators for data-parallel computing through a large array of processing cores and the CUDA programming model with a C-like interface. However, optimizing an application for maximum performance based on the GPU architecture is not a trivial task for the tremendous change from conventional multi-core to the many-core architectures. Besides, the GPU vendors do not disclose much detail about the characteristics of the GPU's architecture. To provide insights into the performance of memory-intensive kernels, we propose a pipelined global memory model to incorporate the most critical global memory performance related factor, uncoalesced memory access pattern, and provide a basis for predicting performance of memory-intensive kernels. As we will demonstrate, the pipeline throughput is dynamic and sensitive to the memory access patterns. We validated our model on the NVIDIA GPUs using CUDA (Compute Unified Device Architecture). The experiment results show that the pipeline captures performance factors related to global memory and is able to estimate the performance for memory-intensive GPU kernels via the proposed model.

引用

页码：14 / 18

页数：5

共 50 条

[21] Applying Eco-Threading Framework to Memory-Intensive Hadoop Applications
Takasaki, Hiroaki
Mostafa, Samih M.
Kusakabe, Shigeru
2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA), 2014,
[22] Architectural Challenges in Memory-Intensive, Real-Time Image Forming
Ahlander, A.
Hellsten, H.
Lind, K.
Lindgren, J.
Svensson, B.
2007 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP), 2007, : 291 - +
[23] Application-driven synthesis of memory-intensive systems-on-chip
Kirovski, D
Lee, C
Potkonjak, M
Mangione-Smith, WH
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1999, 18 (09) : 1316 - 1326
[24] Enabling the CUDA Unified Memory model in Edge, Cloud and HPC offloaded GPU kernels
Montella, Raffaele
Di Luccio, Diana
De Vita, Ciro Giuseppe
Mellone, Gennaro
Lapegna, Marco
Laccetti, Giuliano
Kosta, Sokol
Giunta, Giulio
2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 834 - 841
[25] A Platform for High Level Synthesis of Memory-Intensive Image Processing Algorithms
Papenfuss, Tim
Michel, Holger
FPGA 11: PROCEEDINGS OF THE 2011 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, 2011, : 75 - 78
[26] BAMBU: A MODULAR FRAMEWORK FOR THE HIGH LEVEL SYNTHESIS OF MEMORY-INTENSIVE APPLICATIONS
Pilato, Christian
Ferrandi, Fabrizio
2013 23RD INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2013) PROCEEDINGS, 2013,
[27] Comparison of the performance of various kernels for the survival prediction model
Lee, Seungyeoun
Kim, Nayeon
Kim, Beomseok
Kim, Inyoung
COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2024, 31 (06) : 703 - 708
[28] Co-mining: A Processing-in-Memory Assisted Framework for Memory-Intensive PoW Acceleration
Wang, Tianyu
Shen, Zhaoyan
Shao, Zili
PROCEEDINGS OF THE 23RD ACM SIGPLAN/SIGBED INTERNATIONAL CONFERENCE ON LANGUAGES, COMPILERS, AND TOOLS FOR EMBEDDED SYSTEMS, LCTES 2022, 2022, : 1 - 12
[29] A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels
Braun, Lorenz
Nikas, Sotirios
Song, Chen
Heuveline, Vincent
Froening, Holger
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2021, 18 (01)
[30] Microarchitectural Performance Characterization of Irregular GPU Kernels
O'Neil, Molly A.
Burtscher, Martin
2014 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2014, : 130 - 139

← 1 2 3 4 5 →