Hybrid, Scalable, Trace -Driven Performance Modeling of CPCPUs

被引:7
|
作者
Arafa, Yehia [1 ]
Badawy, Abdel-Hameed [1 ]
ElWazir, Ammar [1 ]
Barai, Atanu [1 ]
Eker, Ali [2 ]
Chennupati, Gopinath [3 ]
Santhi, Nandakishore [4 ]
Eidenbenz, Stephan [4 ]
机构
[1] New Mexico State Univ, Klipsch Sch ECE, Las Cruces, NM 88003 USA
[2] Binghamton Univ, Binghamton, NY USA
[3] Amazon Alexa, New York, NY USA
[4] Los Alamos Natl Lab, Los Alamos, NM USA
关键词
NVIDIA GPUs; Modeling and Simulation; Design Space Exploration; Performance Prediction; PTX; SASS; GPU; ROOFLINE;
D O I
10.1145/3458817.3476221
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present PPT-CPU, a scalable performance prediction toolkit for GPUs. PPT-GPU achieves scalability through a hybrid high-level modeling approach where sonic. computations are extrapolated and multiple parts of the model are parallelized. The tool primary prediction models use pre-collected memory and instructions traces of the workloads to accurately capture the dynamic behavior of the kernels. PPT-CPU reports an extensive array of CPU performance metrics accurately while being easily extensible. We use a broad set of benchmarks to verify predictions accuracy. We compare the results against hardware metrics collected using vendor profiling tools and cycle -accurate simulators. The results show that the performance predictions are highly correlated to the actual hardware (MAPE: < 16% and Correlation: > 0.98). Moreover, PPT-CPU is orders of magnitude faster than cycle -accurate simulators. This comprehensiveness of the collected metrics can guide arcifitects and developers to perform design space explorations. Moreover, the scalability of the tool enables corldWiting efficient and fast sensitivity analyses for performance -critical applications.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Modeling, design, and performance analysis of a parallel hybrid data/command driven architecture system and its scalable dynamic load balancing circuit
    Heath, JR
    Ramamoorthy, S
    Stroud, CE
    Hurt, AD
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-ANALOG AND DIGITAL SIGNAL PROCESSING, 1997, 44 (01): : 22 - 40
  • [2] TRACE-DRIVEN SYSTEM MODELING
    CHENG, PS
    IBM SYSTEMS JOURNAL, 1969, 8 (04) : 280 - &
  • [3] Scalable parallel trace-based performance analysis
    Geimer, Markus
    Wolf, Felix
    Wylie, Brian J. N.
    Mohr, Bernd
    RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 2006, 4192 : 303 - 312
  • [4] Performance analysis of the scalable modeling system
    Schaffer, D
    Middlecoff, J
    Govett, M
    Henderson, T
    REALIZING TERACOMPUTING, 2003, : 235 - 256
  • [5] Trace-driven performance simulation modeling for fast evaluation of multimedia processor by simulation reuse
    Kim, HY
    Kim, TG
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2005, E88A (12) : 3306 - 3314
  • [6] Scalable performance evaluation of a hybrid optical switch
    Vu, HL
    Zalesky, A
    Wong, EWM
    Rosberg, Z
    Bilgrami, SMH
    Zukerman, M
    Tucker, RS
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 2005, 23 (10) : 2961 - 2973
  • [7] A parallel trace-data interface for scalable performance analysis
    Geimer, Markus
    Wolf, Felix
    Knuepfer, Andreas
    Mohr, Bernd
    Wylie, Brian J. N.
    APPLIED PARALLEL COMPUTING: STATE OF THE ART IN SCIENTIFIC COMPUTING, 2007, 4699 : 398 - +
  • [8] Performance Driven Database Design for Scalable Web Applications
    Patvarczki, Jozsef
    Mani, Murali
    Heffernan, Neil
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, PROCEEDINGS, 2009, 5739 : 43 - 58
  • [9] A New Hybrid Approach for Scalable Table-driven Routing in MANETs
    Yoshihiro, Takuya
    Kitamura, Yuji
    Paul, Anup Kumar
    Tachibana, Atsuo
    Hasegawa, Teruyuki
    2018 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2018,
  • [10] TECHNIQUES FOR THE TRACE-DRIVEN SIMULATION OF CACHE PERFORMANCE
    EGGERS, SJ
    LAZOWSKA, ED
    LIN, YB
    1989 WINTER SIMULATION CONFERENCE PROCEEDINGS, 1989, : 1042 - 1046