CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance

被引：1

作者：

Mohammadi, Milad ^{[1
]}

Aamodt, Tor M. ^{[2
]}

Dally, William J. ^{[3
,4
]}

机构：

[1] Stanford Univ, Comp Syst Lab, Gates Room 241, Stanford, CA 94305 USA

[2] Univ British Columbia, Dept Elect & Comp Engn, 2332 Main Mall, Vancouver, BC, Canada

[3] NVIDIA, Santa Clara, CA USA

[4] Stanford Univ, Comp Syst Lab, Gates Room 301, Stanford, CA 94305 USA

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2017年 / 14卷 / 04期

关键词：

Energy efficiency; CPU architecture; block-level execution; DYNAMIC INSTRUMENTATION; INSTRUCTION SET; DESIGN; MICROARCHITECTURE; ARCHITECTURES; END;

D O I：

10.1145/3151034

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We introduce the Coarse-Grain Out-of-Order (CG-OoO) general-purpose processor designed to achieve close to In-Order (InO) processor energy while maintaining Out-of-Order (OoO) performance. CG-OoO is an energy-performance-proportional architecture. Block-level code processing is at the heart of this architecture; CG-OoO speculates, fetches, schedules, and commits code at block-level granularity. It eliminates unnecessary accesses to energy-consuming tables and turns large tables into smaller, distributed tables that are cheaper to access. CG-OoO leverages compiler-level code optimizations to deliver efficient static code and exploits dynamic block-level and instruction-level parallelism. CG-OoO introduces Skipahead, a complexity effective, limited out-of-order instruction scheduling model. Through the energy efficiency techniques applied to the compiler and processor pipeline stages, CG-OoO closes 62% of the average energy gap between the InO and OoO baseline processors at the same area and nearly the same performance as the OoO. This makes CG-OoO 1.8x more efficient than the OoO on the energy-delay product inverse metric. CG-OoO meets the OoO nominal performance while trading off the peak scheduling performance for superior energy efficiency.

引用

页数：26

共 50 条

[41] Fluid Pipelines: Elastic Circuitry meets Out-of-Order Execution
Possignolo, Rafael Trapani
Ebrahimi, Elnaz
Skinner, Haven
Renau, Jose
PROCEEDINGS OF THE 34TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2016, : 233 - 240
[42] Issue logic for a 600 MHz out-of-order execution microprocessor
Farrell, JA
Fischer, TC
1997 SYMPOSIUM ON VLSI CIRCUITS: DIGEST OF TECHNICAL PAPERS, 1997, : 11 - 12
[43] Evaluation and Tradeoffs for Out-of-Order Execution on Reconfigurable Heterogeneous MPSoC
Guo, Qi
Li, Xi
Wang, Chao
Zhou, Xuehai
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2016, 24 (01) : 79 - 91
[44] Efficient Methods for Out-of-Order Load/Store Execution for High-Performance Soft Processors
Wong, Henry
Betz, Vaughn
Rose, Jonathan
PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), 2013, : 442 - 445
[45] Evaluation of Speculation in Out-of-Order Execution of Synchronous Dataflow Networks
Daniel Baudisch
Klaus Schneider
International Journal of Parallel Programming, 2015, 43 : 86 - 129
[46] The Alpha 21264: A 500 MHz out-of-order execution microprocessor
Leibholz, D
Razdan, R
IEEE COMPCON 97, PROCEEDINGS, 1997, : 28 - 36
[47] Efficient strategy for out-of-order event stream processing
Xiao, Y. (yyxiao@tjut.edu.cn), 1600, (17):
[48] A dynamically reconfigurable mixed in-order/out-of-order issue queue for power-aware microprocessors
Bai, Y
Bahar, RI
ISVLSI 2003: IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, PROCEEDINGS: NEW TRENDS AND TECHNOLOGIES FOR VLSI SYSTEMS DESIGN, 2003, : 139 - 146
[49] Efficient Verification of Out-of-Order Behaviors with Relaxed Scoreboards
Freitas, Leandro S.
Andrade, Gabriel A. G.
dos Santos, Luiz C. V.
2012 IEEE 30TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2012, : 510 - 511
[50] CASINO Core Microarchitecture: Generating Out-of-Order Schedules Using Cascaded In-Order Scheduling Windows
Jeong, Ipoom
Park, Seihoon
Lee, Changmin
Ro, Won Woo
2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 2020, : 383 - 396

← 1 2 3 4 5 →