Mirage Cores: The Illusion of Many Out-of-order Cores Using In-order Hardware

被引:5
|
作者
Padmanabha, Shruti [1 ]
Lukefahr, Andrew [2 ]
Das, Reetuparna [1 ]
Mahlke, Scott [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Indiana Univ, Bloomington, IN 47405 USA
基金
美国国家科学基金会;
关键词
Heterogeneous multicores; Energy-efficient architectures; CMP scheduling; POWER MANAGEMENT; PERFORMANCE; IMPACT;
D O I
10.1145/3123939.3123969
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Heterogenous chip multiprocessors (Het-CMPs) offer a combination of large Out-of-Order (OoO) cores optimized for high single-threaded performance and small In-Order (InO) cores optimized for low-energy and area costs. Due to practical constraints, CMP designers must choose to either optimize for total system throughput by utilizing many InO cores or maximize single-thread execution with fewer OoO cores. We propose Mirage Cores, a novel Het-CMP design where clusters of InO cores are architected around an OoO in a manner that optimizes for both throughput and single-thread performance. The insight behind Mirage Cores is that InO cores can achieve near-OoO performance if they are provided with the dynamic instruction schedule of an OoO core. To leverage this, Mirage Cores employs an OoO core as an optimal instruction schedule generator as well as a high-performance alternative for all neighboring InO cores. We also develop intelligent runtime schedulers which orchestrate the arbitration and migration of applications between the InO cores and the central OoO. Fast and timely transfer of dynamic schedules from the OoO to InO allows Mirage Cores to create the appearance of all OoO cores to the user using underlying In-Order hardware. Overall, with an 8 InO per OoO configuration, Mirage Cores can achieve on average 84% of the performance of a CMP with 8 OoO cores, a 28% increase relative to current systems, while conserving 55% of energy and 25% of area costs. We find that we can scale the design to around 12 InOs per OoO before starvation for the OoO starts to hamper system performance.
引用
收藏
页码:745 / 758
页数:14
相关论文
共 50 条
  • [31] Raft with Out-of-order Executions
    Gu X.-S.
    Wei H.-F.
    Qiao L.
    Huang Y.
    Ruan Jian Xue Bao/Journal of Software, 2021, 32 (06): : 1748 - 1778
  • [32] Out-of-order commit processors
    Cristal, A
    Ortega, D
    Llosa, J
    Valero, M
    10TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2004, : 48 - 59
  • [33] Out-of-order vector architectures
    Espasa, R
    Valero, M
    Smith, JE
    THIRTIETH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, 1997, : 160 - 170
  • [34] Cheap out-of-order execution using delayed issue
    Grossman, JP
    2000 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS & PROCESSORS, PROCEEDINGS, 2000, : 549 - 551
  • [35] Regional Out-of-Order Writes in Total Store Order
    Singh, Sawan
    Jimborean, Alexandra
    Ros, Alberto
    PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2020, : 205 - 216
  • [36] Out-of-order instruction fetch using multiple sequencers
    Oberoi, P
    Sohi, G
    2002 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDING, 2002, : 14 - 23
  • [37] Fast out-of-order processor simulation using memoization
    Schnarr, E
    Larus, JR
    ACM SIGPLAN NOTICES, 1998, 33 (11) : 283 - 294
  • [38] NEW HARDWARE SCHEME SUPPORTING PRECISE EXCEPTION HANDLING FOR OUT-OF-ORDER EXECUTION
    HWANG, GC
    KYUNG, CM
    ELECTRONICS LETTERS, 1994, 30 (01) : 16 - 17
  • [39] Predictable Out-of-order Execution Using Virtual Traces
    Whitham, Jack
    Audsley, Neil
    RTSS: 2008 REAL-TIME SYSTEMS SYMPOSIUM, PROCEEDINGS, 2008, : 445 - 455
  • [40] ProfileMe: Hardware support for instruction-level profiling on out-of-order processors
    Dean, J
    Hicks, JE
    Waldspurger, CA
    Weihl, WE
    Chrysos, G
    THIRTIETH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, 1997, : 292 - 302