CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance

被引：1

作者：

Mohammadi, Milad ^{[1
]}

Aamodt, Tor M. ^{[2
]}

Dally, William J. ^{[3
,4
]}

机构：

[1] Stanford Univ, Comp Syst Lab, Gates Room 241, Stanford, CA 94305 USA

[2] Univ British Columbia, Dept Elect & Comp Engn, 2332 Main Mall, Vancouver, BC, Canada

[3] NVIDIA, Santa Clara, CA USA

[4] Stanford Univ, Comp Syst Lab, Gates Room 301, Stanford, CA 94305 USA

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2017年 / 14卷 / 04期

关键词：

Energy efficiency; CPU architecture; block-level execution; DYNAMIC INSTRUMENTATION; INSTRUCTION SET; DESIGN; MICROARCHITECTURE; ARCHITECTURES; END;

D O I：

10.1145/3151034

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We introduce the Coarse-Grain Out-of-Order (CG-OoO) general-purpose processor designed to achieve close to In-Order (InO) processor energy while maintaining Out-of-Order (OoO) performance. CG-OoO is an energy-performance-proportional architecture. Block-level code processing is at the heart of this architecture; CG-OoO speculates, fetches, schedules, and commits code at block-level granularity. It eliminates unnecessary accesses to energy-consuming tables and turns large tables into smaller, distributed tables that are cheaper to access. CG-OoO leverages compiler-level code optimizations to deliver efficient static code and exploits dynamic block-level and instruction-level parallelism. CG-OoO introduces Skipahead, a complexity effective, limited out-of-order instruction scheduling model. Through the energy efficiency techniques applied to the compiler and processor pipeline stages, CG-OoO closes 62% of the average energy gap between the InO and OoO baseline processors at the same area and nearly the same performance as the OoO. This makes CG-OoO 1.8x more efficient than the OoO on the energy-delay product inverse metric. CG-OoO meets the OoO nominal performance while trading off the peak scheduling performance for superior energy efficiency.

引用

页数：26

共 50 条

[1] Microarchitecture of a Coarse-Grain Out-of-Order Superscalar Processor
Capalija, Davor
Abdelrahman, Tarek S.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (02) : 392 - 405
[2] BOLT: Energy-Efficient Out-of-Order Latency-Tolerant Execution
Hilton, Andrew
Roth, Amir
HPCA-16 2010: SIXTEENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2010, : 343 - 354
[3] Achieving out-of-order performance with almost in-order complexity
Tseng, Francis
Patt, Yale N.
ISCA 2008 PROCEEDINGS: 35TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 2008, : 3 - 12
[4] Student Research Poster: Software Out-of-Order Execution for In-Order Architectures
Tran, Kim-Anh
2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT), 2016, : 458 - 458
[5] Efficient Out-of-Order Execution of Guarded ISAs
Premillieu, Nathanael
Seznec, Andre
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2014, 11 (04)
[6] Streamlining long latency instructions for seamlessly combined out-of-order and in-order execution
Wang, Hui
Sangireddy, Rama
MICROPROCESSORS AND MICROSYSTEMS, 2008, 32 (07) : 375 - 385
[7] Analyzing the Impact of Supporting Out-of-Order Communication on In-order Performance with iWARP
Balaji, P.
Feng, W.
Bhagvat, S.
Panda, D. K.
Thakur, R.
Gropp, W.
2007 ACM/IEEE SC07 CONFERENCE, 2010, : 615 - +
[8] OoOJava: Software out-of-order execution
Department of Electrical Engineering and Computer Science, University of California, Irvine, CA 92697, United States
Proc ACM SIGPLAN Symp Prins Pract Parall Program PPOPP, 1600, (57-67):
[9] Out-of-order Execution of Database Queries
Goda, Kazuo
Hayamizu, Yuto
Yamada, Hiroyuki
Kitsuregawa, Masaru
PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12): : 3489 - 3501
[10] Disjoint Out-of-Order Execution Processor
Sharafeddine, Mageda
Jothi, Komal
Akkary, Haitham
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2012, 9 (03)

← 1 2 3 4 5 →