ParTejas: A Parallel Simulator for Multicore Processors

被引：5

作者：

Malhotra, Geetika ^{[1
]}

Kalayappan, Rajshekar ^{[1
]}

Goel, Seep ^{[1
]}

Aggarwal, Pooja ^{[1
]}

Sagar, Abhishek ^{[1
]}

Sarangi, Smruti R. ^{[1
]}

机构：

[1] Indian Inst Technol Delhi, Dept Comp Sci & Engn, New Delhi 110016, India

来源：

ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION | 2017年 / 27卷 / 03期

关键词：

Parallel simulation; architectural simulator; Tejas; ParTejas; phasers; parallel ports; slot scheduling; SYSTEM SIMULATION;

D O I：

10.1145/3077582

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In this article, we present the design of a novel parallel architecture simulator called ParTejas. ParTejas is a timing simulation engine that gets its execution traces from instrumented binaries using a fast shared-memory-based mechanism. Subsequently, the waiting threads simulate the execution of multiple pipelines and an elaborate memory system with support for multilevel coherent caches. ParTejas is written in Java and primarily derives its speedups from the use of novel data structures. Specifically, it uses lock-free slot schedulers to design an entity called a parallel port that effectively models the contention at shared resources in the CPU and memory system. Parallel ports remove the need for fine-grained synchronization and allow each thread to use its local clock. Unlike conventional simulators that use barriers for synchronization at epoch boundaries, we use a sophisticated type of barrier, known as a phaser. A phaser allows threads to perform additional work without waiting for other threads to arrive at the barrier. Additionally, we use a host of Java-specific optimizations and use profiling to effectively schedule the threads. With all our optimizations, we demonstrate a speedup of 11.8x for a multi-issue in-order pipeline and 10.9x for an out-of-order pipeline with 64 threads, for a suite of seven Splash2 and Parsec benchmarks. The simulation error is limited to 2% to 4% as compared to strictly sequential simulation.

引用

页数：24

共 50 条

[21] Energy Efficient Block-Partitioned Multicore Processors for Parallel Applications
Qi, Xuan
Zhu, Da-Kai
[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2011, 26 (03) : 418 - 433
[22] Wyner-Ziv Frame Parallel Decoding Based on Multicore Processors
Corrales-Garcia, A.
Martinez, J. L.
Fernandez-Escribano, G.
Quiles, F. J.
Fernando, W. A. C.
[J]. 2011 IEEE 13TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2011,
[23] Energy Efficient Block-Partitioned Multicore Processors for Parallel Applications
祁轩
朱大开
[J]. Journal of Computer Science & Technology, 2011, 26 (03) : 418 - 433
[24] Massively Parallel Computation of Lattice Associative Memory Classifiers on Multicore Processors
Ritter, Gerhard X.
Schmalz, Mark S.
Hayden, Eric T.
[J]. MATHEMATICS OF DATA/IMAGE PATTERN CODING, COMPRESSION, AND ENCRYPTION WITH APPLICATIONS XIII, 2011, 8136
[25] Techniques for designing efficient parallel graph algorithms for SMPs and multicore processors
Cong, Guojing
Bader, David A.
[J]. PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2007, 4742 : 137 - 147
[26] Energy Efficient Block-Partitioned Multicore Processors for Parallel Applications
Xuan Qi
Da-Kai Zhu
[J]. Journal of Computer Science and Technology, 2011, 26
[27] Parallel Design of Control Systems Utilizing Dead Time for Embedded Multicore Processors
Suzuki, Yuta
Sata, Kota
Kako, Junichi
Yamaguchi, Kohei
Arakawa, Fumio
Edahiro, Masato
[J]. 2014 IEEE COOL CHIPS XVII, 2014,
[28] Modeling power and energy of the task-parallel Cholesky factorization on multicore processors
Alonso, Pedro
Dolz, Manuel F.
Mayo, Rafael
Quintana-Orti, Enrique S.
[J]. COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2014, 29 (02): : 105 - 112
[29] Evaluation of Multicore Processors for Embedded Systems by Parallel Benchmark Program Using OpenMP
Hanawa, Toshihiro
Sato, Mitsuhisa
Lee, Jinpil
Imada, Takayuki
Kimura, Hideaki
Boku, Taisuke
[J]. EVOLVING OPENMP IN AN AGE OF EXTREME PARALLELISM, 2009, 5568 : 15 - 27
[30] The fast multipole method on parallel clusters, multicore processors, and graphics processing units
Darve, Eric
Cecka, Cris
Takahashi, Toru
[J]. COMPTES RENDUS MECANIQUE, 2011, 339 (2-3): : 185 - 193

← 1 2 3 4 5 →