Reconstructing Out-of-Order Issue Queue

被引:2
|
作者
Jeong, Ipoom [1 ]
Lee, Jiwon [1 ]
Yoon, Myung Kuk [2 ]
Ro, Won Woo [1 ]
机构
[1] Yonsei Univ, Sch Elect & Elect Engn, Seoul, South Korea
[2] Ewha Womans Univ, Dept Comp Sci & Engn, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Dynamic Scheduling; Data Dependence; Steering; INSTRUCTION; MICROARCHITECTURE; CORE;
D O I
10.1109/MICRO56248.2022.00023
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Out-of-order cores provide high performance at the cost of energy efficiency. Dynamic scheduling is one of the major contributors to this: generating highly optimized issue schedules considering both data dependences and underlying execution resources, but relying heavily on complex wakeup and select operations of an out-of-order issue queue (IQ). For decades, researchers have proposed several complexity-effective dynamic scheduling schemes by leveraging the energy efficiency of an in-order IQ. However, they are either costly or not capable of delivering sufficient performance to substitute for a conventional wide-issue out-of-order IQ. In this work, we revisit two previous designs: one classical dependence-based design and the other state-of-the-art readiness-based design. We observe that they are complementary to each other, and thus their synergistic integration has the potential to be a good alternative to an out-of-order IQ. We first combine these two designs, and further analyze the main architectural bottlenecks that incur the underutilization of aggregate issue capability, thereby limiting the exploitation of instruction-level and memory-level parallelisms: 1) memory dependences not exposed by the register-based dependence analysis and 2) wide and shallow nature of dynamic dependence chains due to the long-latency memory accesses. To this end, we propose Ballerino, a novel microarchitecture that performs balanced and cache-miss-tolerable dynamic scheduling via a complementary combination of cascaded and clustered in-order IQs. Ballerino is built upon three key functionalities: 1) speculatively filtering out ready-at-dispatch instructions, 2) eliminating wasteful wakeup operations via a simple steering technique leveraging the awareness of memory dependences, and 3) reacting to program phase changes by allowing different load-dependent chains to share a single IQ while guaranteeing their out-of-order issue. The net effect is minimal scheduling energy consumption per instruction while providing comparable scheduling performance to a fully out-of-order IQ. In our analysis, Ballerino achieves comparable performance to an 8-wide out-of-order core by using twelve in-order IQs, improving core-wide energy efficiency by 20%.
引用
收藏
页码:144 / 161
页数:18
相关论文
共 50 条
  • [1] Design and optimization of Issue queue in Out-of-Order superscalar microprocessor
    Sui Bingcai
    Sun Caixia
    Wang Yongwen
    Guo Hui
    2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 294 - 298
  • [2] A dynamically reconfigurable mixed in-order/out-of-order issue queue for power-aware microprocessors
    Bai, Y
    Bahar, RI
    ISVLSI 2003: IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, PROCEEDINGS: NEW TRENDS AND TECHNOLOGIES FOR VLSI SYSTEMS DESIGN, 2003, : 139 - 146
  • [3] An Out-of-Order Load-Store Queue for Spatial Computing
    Josipovic, Lana
    Brisk, Philip
    Ienne, Paolo
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2017, 16
  • [4] An Out-of-Order Load-Store Queue for Spatial Computing
    Josipovic, Lana
    Brisk, Philip
    Ienne, Paolo
    2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, : 134 - 134
  • [5] Cheap out-of-order execution using delayed issue
    Grossman, JP
    2000 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS & PROCESSORS, PROCEEDINGS, 2000, : 549 - 551
  • [6] A quad-issue out-of-order RISC CPU
    Lotz, J
    Lesartre, G
    Naffziger, S
    Kipp, D
    1996 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE, DIGEST OF TECHNICAL PAPERS, 1996, 39 : 210 - 211
  • [7] Federation: Repurposing scalar cores for out-of-order instruction issue
    Tarjan, David
    Boyer, Michael
    Skadron, Kevin
    2008 45TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2, 2008, : 772 - 775
  • [8] Substituting associative load queue with simple hash tables in out-of-order microprocessors
    Garg, Alok
    Castro, Fernando
    Huang, Michael
    Chaver, Dani
    Pinuel, Luis
    Prieto, Manuel
    ISLPED '06: PROCEEDINGS OF THE 2006 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, 2006, : 268 - 273
  • [9] Issue logic for a 600 MHz out-of-order execution microprocessor
    Farrell, JA
    Fischer, TC
    1997 SYMPOSIUM ON VLSI CIRCUITS: DIGEST OF TECHNICAL PAPERS, 1997, : 11 - 12
  • [10] Reusing cached schedules in an out-of-order processor with in-order issue logic
    Palomar, Oscar
    Juan, Toni
    Navarro, Juan J.
    2009 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, 2009, : 246 - +