CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance

被引:1
|
作者
Mohammadi, Milad [1 ]
Aamodt, Tor M. [2 ]
Dally, William J. [3 ,4 ]
机构
[1] Stanford Univ, Comp Syst Lab, Gates Room 241, Stanford, CA 94305 USA
[2] Univ British Columbia, Dept Elect & Comp Engn, 2332 Main Mall, Vancouver, BC, Canada
[3] NVIDIA, Santa Clara, CA USA
[4] Stanford Univ, Comp Syst Lab, Gates Room 301, Stanford, CA 94305 USA
关键词
Energy efficiency; CPU architecture; block-level execution; DYNAMIC INSTRUMENTATION; INSTRUCTION SET; DESIGN; MICROARCHITECTURE; ARCHITECTURES; END;
D O I
10.1145/3151034
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce the Coarse-Grain Out-of-Order (CG-OoO) general-purpose processor designed to achieve close to In-Order (InO) processor energy while maintaining Out-of-Order (OoO) performance. CG-OoO is an energy-performance-proportional architecture. Block-level code processing is at the heart of this architecture; CG-OoO speculates, fetches, schedules, and commits code at block-level granularity. It eliminates unnecessary accesses to energy-consuming tables and turns large tables into smaller, distributed tables that are cheaper to access. CG-OoO leverages compiler-level code optimizations to deliver efficient static code and exploits dynamic block-level and instruction-level parallelism. CG-OoO introduces Skipahead, a complexity effective, limited out-of-order instruction scheduling model. Through the energy efficiency techniques applied to the compiler and processor pipeline stages, CG-OoO closes 62% of the average energy gap between the InO and OoO baseline processors at the same area and nearly the same performance as the OoO. This makes CG-OoO 1.8x more efficient than the OoO on the energy-delay product inverse metric. CG-OoO meets the OoO nominal performance while trading off the peak scheduling performance for superior energy efficiency.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] Fluid Pipelines: Elastic Circuitry meets Out-of-Order Execution
    Possignolo, Rafael Trapani
    Ebrahimi, Elnaz
    Skinner, Haven
    Renau, Jose
    PROCEEDINGS OF THE 34TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2016, : 233 - 240
  • [42] Issue logic for a 600 MHz out-of-order execution microprocessor
    Farrell, JA
    Fischer, TC
    1997 SYMPOSIUM ON VLSI CIRCUITS: DIGEST OF TECHNICAL PAPERS, 1997, : 11 - 12
  • [43] Evaluation and Tradeoffs for Out-of-Order Execution on Reconfigurable Heterogeneous MPSoC
    Guo, Qi
    Li, Xi
    Wang, Chao
    Zhou, Xuehai
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2016, 24 (01) : 79 - 91
  • [44] Efficient Methods for Out-of-Order Load/Store Execution for High-Performance Soft Processors
    Wong, Henry
    Betz, Vaughn
    Rose, Jonathan
    PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), 2013, : 442 - 445
  • [45] Evaluation of Speculation in Out-of-Order Execution of Synchronous Dataflow Networks
    Daniel Baudisch
    Klaus Schneider
    International Journal of Parallel Programming, 2015, 43 : 86 - 129
  • [46] The Alpha 21264: A 500 MHz out-of-order execution microprocessor
    Leibholz, D
    Razdan, R
    IEEE COMPCON 97, PROCEEDINGS, 1997, : 28 - 36
  • [47] Efficient strategy for out-of-order event stream processing
    Xiao, Y. (yyxiao@tjut.edu.cn), 1600, (17):
  • [48] A dynamically reconfigurable mixed in-order/out-of-order issue queue for power-aware microprocessors
    Bai, Y
    Bahar, RI
    ISVLSI 2003: IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, PROCEEDINGS: NEW TRENDS AND TECHNOLOGIES FOR VLSI SYSTEMS DESIGN, 2003, : 139 - 146
  • [49] Efficient Verification of Out-of-Order Behaviors with Relaxed Scoreboards
    Freitas, Leandro S.
    Andrade, Gabriel A. G.
    dos Santos, Luiz C. V.
    2012 IEEE 30TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2012, : 510 - 511
  • [50] CASINO Core Microarchitecture: Generating Out-of-Order Schedules Using Cascaded In-Order Scheduling Windows
    Jeong, Ipoom
    Park, Seihoon
    Lee, Changmin
    Ro, Won Woo
    2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 2020, : 383 - 396