Reconstructing Out-of-Order Issue Queue

被引:2
|
作者
Jeong, Ipoom [1 ]
Lee, Jiwon [1 ]
Yoon, Myung Kuk [2 ]
Ro, Won Woo [1 ]
机构
[1] Yonsei Univ, Sch Elect & Elect Engn, Seoul, South Korea
[2] Ewha Womans Univ, Dept Comp Sci & Engn, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Dynamic Scheduling; Data Dependence; Steering; INSTRUCTION; MICROARCHITECTURE; CORE;
D O I
10.1109/MICRO56248.2022.00023
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Out-of-order cores provide high performance at the cost of energy efficiency. Dynamic scheduling is one of the major contributors to this: generating highly optimized issue schedules considering both data dependences and underlying execution resources, but relying heavily on complex wakeup and select operations of an out-of-order issue queue (IQ). For decades, researchers have proposed several complexity-effective dynamic scheduling schemes by leveraging the energy efficiency of an in-order IQ. However, they are either costly or not capable of delivering sufficient performance to substitute for a conventional wide-issue out-of-order IQ. In this work, we revisit two previous designs: one classical dependence-based design and the other state-of-the-art readiness-based design. We observe that they are complementary to each other, and thus their synergistic integration has the potential to be a good alternative to an out-of-order IQ. We first combine these two designs, and further analyze the main architectural bottlenecks that incur the underutilization of aggregate issue capability, thereby limiting the exploitation of instruction-level and memory-level parallelisms: 1) memory dependences not exposed by the register-based dependence analysis and 2) wide and shallow nature of dynamic dependence chains due to the long-latency memory accesses. To this end, we propose Ballerino, a novel microarchitecture that performs balanced and cache-miss-tolerable dynamic scheduling via a complementary combination of cascaded and clustered in-order IQs. Ballerino is built upon three key functionalities: 1) speculatively filtering out ready-at-dispatch instructions, 2) eliminating wasteful wakeup operations via a simple steering technique leveraging the awareness of memory dependences, and 3) reacting to program phase changes by allowing different load-dependent chains to share a single IQ while guaranteeing their out-of-order issue. The net effect is minimal scheduling energy consumption per instruction while providing comparable scheduling performance to a fully out-of-order IQ. In our analysis, Ballerino achieves comparable performance to an 8-wide out-of-order core by using twelve in-order IQs, improving core-wide energy efficiency by 20%.
引用
收藏
页码:144 / 161
页数:18
相关论文
共 50 条
  • [21] Reorder buffer structure with shelter buffer for out-of-order issue superscalar processors
    Chang, MS
    Park, CS
    Choi, SB
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2000, E83A (06) : 1091 - 1099
  • [23] Evaluating register allocation and instruction scheduling techniques in out-of-order issue processors
    Valluri, Madhavi Gopal
    Govindarajan, R.
    Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, 1999, : 78 - 83
  • [24] Cost-effective implementation of an ECC-protected instruction queue for out-of-order microprocessors
    Stojanovic, Vladimir
    Bahar, R. Iris
    Dworak, Jennifer
    Weiss, Richard
    43RD DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2006, 2006, : 705 - +
  • [25] OUT-OF-ORDER - THE PUBLIC ART MACHINE
    PHILLIPS, PC
    ARTFORUM, 1988, 27 (04): : 92 - 97
  • [26] A Taxonomy of Out-of-Order Instruction Commit
    Alipour, Mehdi
    Carlson, Trevor E.
    Kaxiras, Stefanos
    2017 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), 2017, : 135 - 136
  • [27] Formal verification of Out-of-order Processor
    Gao, Yanyan
    Li, Xi
    2009 INTERNATIONAL CONFERENCE ON COMPUTER MODELING AND SIMULATION, PROCEEDINGS, 2009, : 129 - 135
  • [28] In-order issue out-of-order execution floating-point coprocessor for CalmRISC32
    Jeong, CH
    Park, WC
    Han, TD
    Kim, SW
    Lee, MK
    ARITH-15 2001: 15TH SYMPOSIUM ON COMPUTER ARITHMETIC, PROCEEDINGS, 2001, : 195 - 200
  • [29] Optimization for a superscalar out-of-order machine
    Holler, AM
    PROCEEDINGS OF THE 29TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE - MICRO-29, 1996, : 336 - 348
  • [30] OoOJava: Software out-of-order execution
    Department of Electrical Engineering and Computer Science, University of California, Irvine, CA 92697, United States
    Proc ACM SIGPLAN Symp Prins Pract Parall Program PPOPP, 1600, (57-67):