CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance

被引：1

作者：

Mohammadi, Milad ^{[1
]}

Aamodt, Tor M. ^{[2
]}

Dally, William J. ^{[3
,4
]}

机构：

[1] Stanford Univ, Comp Syst Lab, Gates Room 241, Stanford, CA 94305 USA

[2] Univ British Columbia, Dept Elect & Comp Engn, 2332 Main Mall, Vancouver, BC, Canada

[3] NVIDIA, Santa Clara, CA USA

[4] Stanford Univ, Comp Syst Lab, Gates Room 301, Stanford, CA 94305 USA

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2017年 / 14卷 / 04期

关键词：

Energy efficiency; CPU architecture; block-level execution; DYNAMIC INSTRUMENTATION; INSTRUCTION SET; DESIGN; MICROARCHITECTURE; ARCHITECTURES; END;

D O I：

10.1145/3151034

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We introduce the Coarse-Grain Out-of-Order (CG-OoO) general-purpose processor designed to achieve close to In-Order (InO) processor energy while maintaining Out-of-Order (OoO) performance. CG-OoO is an energy-performance-proportional architecture. Block-level code processing is at the heart of this architecture; CG-OoO speculates, fetches, schedules, and commits code at block-level granularity. It eliminates unnecessary accesses to energy-consuming tables and turns large tables into smaller, distributed tables that are cheaper to access. CG-OoO leverages compiler-level code optimizations to deliver efficient static code and exploits dynamic block-level and instruction-level parallelism. CG-OoO introduces Skipahead, a complexity effective, limited out-of-order instruction scheduling model. Through the energy efficiency techniques applied to the compiler and processor pipeline stages, CG-OoO closes 62% of the average energy gap between the InO and OoO baseline processors at the same area and nearly the same performance as the OoO. This makes CG-OoO 1.8x more efficient than the OoO on the energy-delay product inverse metric. CG-OoO meets the OoO nominal performance while trading off the peak scheduling performance for superior energy efficiency.

引用

页数：26

共 50 条

[31] Architecture Support for Task Out-of-Order Execution in MPSoCs
Wang, Chao
Li, Xi
Zhang, Junneng
Chen, Peng
Chen, Yunji
Zhou, Xuehai
Cheung, Ray C. C.
IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (05) : 1296 - 1310
[32] Predictable Out-of-order Execution Using Virtual Traces
Whitham, Jack
Audsley, Neil
RTSS: 2008 REAL-TIME SYSTEMS SYMPOSIUM, PROCEEDINGS, 2008, : 445 - 455
[33] CHECKPOINT REPAIR FOR HIGH-PERFORMANCE OUT-OF-ORDER EXECUTION MACHINES
HWU, WMW
PATT, YN
IEEE TRANSACTIONS ON COMPUTERS, 1987, 36 (12) : 1496 - 1514
[34] Exploring the Performance Limits of Out-of-order Commit
Alipour, Mehdi
Carlson, Trevor E.
Kaxiras, Stefanos
ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, : 211 - 220
[35] Asynchronous multiple-issue on-chip bus with in-order/out-of-order completion
Jung, EG
Lee, JG
Kwak, SH
Jhang, KS
Lee, JA
Har, DS
IEICE TRANSACTIONS ON ELECTRONICS, 2005, E88C (12): : 2395 - 2399
[36] OSIA: Out-of-order Scheduling for In-order Arriving in concurrent multi-path transfer
Wang, Jingyu
Liao, Jianxin
Li, Tonghong
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2012, 35 (02) : 633 - 643
[37] Improving branch prediction and predicated execution in out-of-order processors
Quinones, Eduardo
Parcerisa, Joan-Manuel
Gonzalez, Antonio
THIRTEENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2007, : 75 - +
[38] Format verification of out-of-order execution using incremental flushing
Skakkebæk, JU
Jones, RB
Dill, DL
COMPUTER AIDED VERIFICATION, 1998, 1427 : 98 - 109
[39] Evaluation of Speculation in Out-of-Order Execution of Synchronous Dataflow Networks
Baudisch, Daniel
Schneider, Klaus
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2015, 43 (01) : 86 - 129
[40] Out-Of-Order Execution of Synchronous Data-Flow Networks
Baudisch, Daniel
Brandt, Jens
Schneider, Klaus
2012 INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS (SAMOS): ARCHITECTURES, MODELING AND SIMULATION, 2012, : 168 - 175

← 1 2 3 4 5 →