Hybrid, Scalable, Trace -Driven Performance Modeling of CPCPUs

被引：7

作者：

Arafa, Yehia ^{[1
]}

Badawy, Abdel-Hameed ^{[1
]}

ElWazir, Ammar ^{[1
]}

Barai, Atanu ^{[1
]}

Eker, Ali ^{[2
]}

Chennupati, Gopinath ^{[3
]}

Santhi, Nandakishore ^{[4
]}

Eidenbenz, Stephan ^{[4
]}

机构：

[1] New Mexico State Univ, Klipsch Sch ECE, Las Cruces, NM 88003 USA

[2] Binghamton Univ, Binghamton, NY USA

[3] Amazon Alexa, New York, NY USA

[4] Los Alamos Natl Lab, Los Alamos, NM USA

来源：

SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS | 2021年

关键词：

NVIDIA GPUs; Modeling and Simulation; Design Space Exploration; Performance Prediction; PTX; SASS; GPU; ROOFLINE;

D O I：

10.1145/3458817.3476221

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we present PPT-CPU, a scalable performance prediction toolkit for GPUs. PPT-GPU achieves scalability through a hybrid high-level modeling approach where sonic. computations are extrapolated and multiple parts of the model are parallelized. The tool primary prediction models use pre-collected memory and instructions traces of the workloads to accurately capture the dynamic behavior of the kernels. PPT-CPU reports an extensive array of CPU performance metrics accurately while being easily extensible. We use a broad set of benchmarks to verify predictions accuracy. We compare the results against hardware metrics collected using vendor profiling tools and cycle -accurate simulators. The results show that the performance predictions are highly correlated to the actual hardware (MAPE: < 16% and Correlation: > 0.98). Moreover, PPT-CPU is orders of magnitude faster than cycle -accurate simulators. This comprehensiveness of the collected metrics can guide arcifitects and developers to perform design space explorations. Moreover, the scalability of the tool enables corldWiting efficient and fast sensitivity analyses for performance -critical applications.

引用

页数：15

共 50 条

[21] Practical and Scalable ML-Driven Cloud Performance Debugging With Sage
Gan, Yu
Liang, Mingyu
Dev, Sundar
Lo, David
Delimitrou, Christina
IEEE MICRO, 2022, 42 (04) : 27 - 36
[22] Congestion and performance driven full-chip scalable routing framework
Yao, HL
Cai, YC
Hong, XL
Zhou, Q
2005 6TH INTERNATIONAL CONFERENCE ON ASIC PROCEEDINGS, BOOKS 1 AND 2, 2005, : 768 - 771
[23] Accurately modeling speculative instruction fetching in trace-driven simulation
Bhargava, R
John, LK
Matus, F
1999 IEEE INTERNATIONAL PERFORMANCE, COMPUTING AND COMMUNICATIONS CONFERENCE, 1999, : 65 - 71
[24] TRACE-DRIVEN MODELING AND ANALYSIS OF CPU SCHEDULING IN A MULTIPROGRAMMING SYSTEM
SHERMAN, S
BROWNE, JC
BASKETT, F
COMMUNICATIONS OF THE ACM, 1972, 15 (12) : 1063 - &
[25] HYBRID COMPUTER PERFORMANCE MODELING SYSTEM
FOXLEY, E
COMPUTER JOURNAL, 1978, 21 (03): : 205 - 209
[26] Performance modeling of scalable encryption algorithm using parallel computation
2013, UK Simulation Society, Clifton Lane, Nottingham, NG11 8NS, United Kingdom (14):
[27] Performance modeling of distributed hybrid architectures
Spinnato, PF
van Albada, GD
Sloot, PMA
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2004, 15 (01) : 81 - 92
[28] Performance Modeling of Scalable Resource Allocations with the Imperial PEPA Compiler
Sanders, William S.
Srivastava, Srishti
Banicescu, Ioana
2022 21ST INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC 2022), 2022, : 99 - 106
[29] Modeling and performance analysis of Scalable Web Servers Deployed on the Cloud
Aljohani, A. M. D.
Holton, D. R. W.
Awan, I.
2013 EIGHTH INTERNATIONAL CONFERENCE ON BROADBAND, WIRELESS COMPUTING, COMMUNICATION AND APPLICATIONS (BWCCA 2013), 2013, : 238 - 242
[30] Modeling laser performance of scalable side pumped alkali laser
Komashko, Aleksey M.
Zweiback, Jason
HIGH ENERGY/AVERAGE POWER LASERS AND INTENSE BEAM APPLICATIONS IV, 2010, 7581

← 1 2 3 4 5 →