CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance

被引:1
|
作者
Mohammadi, Milad [1 ]
Aamodt, Tor M. [2 ]
Dally, William J. [3 ,4 ]
机构
[1] Stanford Univ, Comp Syst Lab, Gates Room 241, Stanford, CA 94305 USA
[2] Univ British Columbia, Dept Elect & Comp Engn, 2332 Main Mall, Vancouver, BC, Canada
[3] NVIDIA, Santa Clara, CA USA
[4] Stanford Univ, Comp Syst Lab, Gates Room 301, Stanford, CA 94305 USA
关键词
Energy efficiency; CPU architecture; block-level execution; DYNAMIC INSTRUMENTATION; INSTRUCTION SET; DESIGN; MICROARCHITECTURE; ARCHITECTURES; END;
D O I
10.1145/3151034
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce the Coarse-Grain Out-of-Order (CG-OoO) general-purpose processor designed to achieve close to In-Order (InO) processor energy while maintaining Out-of-Order (OoO) performance. CG-OoO is an energy-performance-proportional architecture. Block-level code processing is at the heart of this architecture; CG-OoO speculates, fetches, schedules, and commits code at block-level granularity. It eliminates unnecessary accesses to energy-consuming tables and turns large tables into smaller, distributed tables that are cheaper to access. CG-OoO leverages compiler-level code optimizations to deliver efficient static code and exploits dynamic block-level and instruction-level parallelism. CG-OoO introduces Skipahead, a complexity effective, limited out-of-order instruction scheduling model. Through the energy efficiency techniques applied to the compiler and processor pipeline stages, CG-OoO closes 62% of the average energy gap between the InO and OoO baseline processors at the same area and nearly the same performance as the OoO. This makes CG-OoO 1.8x more efficient than the OoO on the energy-delay product inverse metric. CG-OoO meets the OoO nominal performance while trading off the peak scheduling performance for superior energy efficiency.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Microarchitecture of a Coarse-Grain Out-of-Order Superscalar Processor
    Capalija, Davor
    Abdelrahman, Tarek S.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (02) : 392 - 405
  • [2] BOLT: Energy-Efficient Out-of-Order Latency-Tolerant Execution
    Hilton, Andrew
    Roth, Amir
    HPCA-16 2010: SIXTEENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2010, : 343 - 354
  • [3] Achieving out-of-order performance with almost in-order complexity
    Tseng, Francis
    Patt, Yale N.
    ISCA 2008 PROCEEDINGS: 35TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 2008, : 3 - 12
  • [4] Student Research Poster: Software Out-of-Order Execution for In-Order Architectures
    Tran, Kim-Anh
    2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT), 2016, : 458 - 458
  • [5] Efficient Out-of-Order Execution of Guarded ISAs
    Premillieu, Nathanael
    Seznec, Andre
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2014, 11 (04)
  • [6] Streamlining long latency instructions for seamlessly combined out-of-order and in-order execution
    Wang, Hui
    Sangireddy, Rama
    MICROPROCESSORS AND MICROSYSTEMS, 2008, 32 (07) : 375 - 385
  • [7] Analyzing the Impact of Supporting Out-of-Order Communication on In-order Performance with iWARP
    Balaji, P.
    Feng, W.
    Bhagvat, S.
    Panda, D. K.
    Thakur, R.
    Gropp, W.
    2007 ACM/IEEE SC07 CONFERENCE, 2010, : 615 - +
  • [8] OoOJava: Software out-of-order execution
    Department of Electrical Engineering and Computer Science, University of California, Irvine, CA 92697, United States
    Proc ACM SIGPLAN Symp Prins Pract Parall Program PPOPP, 1600, (57-67):
  • [9] Out-of-order Execution of Database Queries
    Goda, Kazuo
    Hayamizu, Yuto
    Yamada, Hiroyuki
    Kitsuregawa, Masaru
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12): : 3489 - 3501
  • [10] Disjoint Out-of-Order Execution Processor
    Sharafeddine, Mageda
    Jothi, Komal
    Akkary, Haitham
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2012, 9 (03)