ParTejas: A Parallel Simulator for Multicore Processors

被引:5
|
作者
Malhotra, Geetika [1 ]
Kalayappan, Rajshekar [1 ]
Goel, Seep [1 ]
Aggarwal, Pooja [1 ]
Sagar, Abhishek [1 ]
Sarangi, Smruti R. [1 ]
机构
[1] Indian Inst Technol Delhi, Dept Comp Sci & Engn, New Delhi 110016, India
关键词
Parallel simulation; architectural simulator; Tejas; ParTejas; phasers; parallel ports; slot scheduling; SYSTEM SIMULATION;
D O I
10.1145/3077582
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this article, we present the design of a novel parallel architecture simulator called ParTejas. ParTejas is a timing simulation engine that gets its execution traces from instrumented binaries using a fast shared-memory-based mechanism. Subsequently, the waiting threads simulate the execution of multiple pipelines and an elaborate memory system with support for multilevel coherent caches. ParTejas is written in Java and primarily derives its speedups from the use of novel data structures. Specifically, it uses lock-free slot schedulers to design an entity called a parallel port that effectively models the contention at shared resources in the CPU and memory system. Parallel ports remove the need for fine-grained synchronization and allow each thread to use its local clock. Unlike conventional simulators that use barriers for synchronization at epoch boundaries, we use a sophisticated type of barrier, known as a phaser. A phaser allows threads to perform additional work without waiting for other threads to arrive at the barrier. Additionally, we use a host of Java-specific optimizations and use profiling to effectively schedule the threads. With all our optimizations, we demonstrate a speedup of 11.8x for a multi-issue in-order pipeline and 10.9x for an out-of-order pipeline with 64 threads, for a suite of seven Splash2 and Parsec benchmarks. The simulation error is limited to 2% to 4% as compared to strictly sequential simulation.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Energy Efficient Block-Partitioned Multicore Processors for Parallel Applications
    Qi, Xuan
    Zhu, Da-Kai
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2011, 26 (03) : 418 - 433
  • [22] Wyner-Ziv Frame Parallel Decoding Based on Multicore Processors
    Corrales-Garcia, A.
    Martinez, J. L.
    Fernandez-Escribano, G.
    Quiles, F. J.
    Fernando, W. A. C.
    [J]. 2011 IEEE 13TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2011,
  • [23] Energy Efficient Block-Partitioned Multicore Processors for Parallel Applications
    祁轩
    朱大开
    [J]. Journal of Computer Science & Technology, 2011, 26 (03) : 418 - 433
  • [24] Massively Parallel Computation of Lattice Associative Memory Classifiers on Multicore Processors
    Ritter, Gerhard X.
    Schmalz, Mark S.
    Hayden, Eric T.
    [J]. MATHEMATICS OF DATA/IMAGE PATTERN CODING, COMPRESSION, AND ENCRYPTION WITH APPLICATIONS XIII, 2011, 8136
  • [25] Techniques for designing efficient parallel graph algorithms for SMPs and multicore processors
    Cong, Guojing
    Bader, David A.
    [J]. PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2007, 4742 : 137 - 147
  • [26] Energy Efficient Block-Partitioned Multicore Processors for Parallel Applications
    Xuan Qi
    Da-Kai Zhu
    [J]. Journal of Computer Science and Technology, 2011, 26
  • [27] Parallel Design of Control Systems Utilizing Dead Time for Embedded Multicore Processors
    Suzuki, Yuta
    Sata, Kota
    Kako, Junichi
    Yamaguchi, Kohei
    Arakawa, Fumio
    Edahiro, Masato
    [J]. 2014 IEEE COOL CHIPS XVII, 2014,
  • [28] Modeling power and energy of the task-parallel Cholesky factorization on multicore processors
    Alonso, Pedro
    Dolz, Manuel F.
    Mayo, Rafael
    Quintana-Orti, Enrique S.
    [J]. COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2014, 29 (02): : 105 - 112
  • [29] Evaluation of Multicore Processors for Embedded Systems by Parallel Benchmark Program Using OpenMP
    Hanawa, Toshihiro
    Sato, Mitsuhisa
    Lee, Jinpil
    Imada, Takayuki
    Kimura, Hideaki
    Boku, Taisuke
    [J]. EVOLVING OPENMP IN AN AGE OF EXTREME PARALLELISM, 2009, 5568 : 15 - 27
  • [30] The fast multipole method on parallel clusters, multicore processors, and graphics processing units
    Darve, Eric
    Cecka, Cris
    Takahashi, Toru
    [J]. COMPTES RENDUS MECANIQUE, 2011, 339 (2-3): : 185 - 193