CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance

被引:1
|
作者
Mohammadi, Milad [1 ]
Aamodt, Tor M. [2 ]
Dally, William J. [3 ,4 ]
机构
[1] Stanford Univ, Comp Syst Lab, Gates Room 241, Stanford, CA 94305 USA
[2] Univ British Columbia, Dept Elect & Comp Engn, 2332 Main Mall, Vancouver, BC, Canada
[3] NVIDIA, Santa Clara, CA USA
[4] Stanford Univ, Comp Syst Lab, Gates Room 301, Stanford, CA 94305 USA
关键词
Energy efficiency; CPU architecture; block-level execution; DYNAMIC INSTRUMENTATION; INSTRUCTION SET; DESIGN; MICROARCHITECTURE; ARCHITECTURES; END;
D O I
10.1145/3151034
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce the Coarse-Grain Out-of-Order (CG-OoO) general-purpose processor designed to achieve close to In-Order (InO) processor energy while maintaining Out-of-Order (OoO) performance. CG-OoO is an energy-performance-proportional architecture. Block-level code processing is at the heart of this architecture; CG-OoO speculates, fetches, schedules, and commits code at block-level granularity. It eliminates unnecessary accesses to energy-consuming tables and turns large tables into smaller, distributed tables that are cheaper to access. CG-OoO leverages compiler-level code optimizations to deliver efficient static code and exploits dynamic block-level and instruction-level parallelism. CG-OoO introduces Skipahead, a complexity effective, limited out-of-order instruction scheduling model. Through the energy efficiency techniques applied to the compiler and processor pipeline stages, CG-OoO closes 62% of the average energy gap between the InO and OoO baseline processors at the same area and nearly the same performance as the OoO. This makes CG-OoO 1.8x more efficient than the OoO on the energy-delay product inverse metric. CG-OoO meets the OoO nominal performance while trading off the peak scheduling performance for superior energy efficiency.
引用
收藏
页数:26
相关论文
共 50 条
  • [31] Architecture Support for Task Out-of-Order Execution in MPSoCs
    Wang, Chao
    Li, Xi
    Zhang, Junneng
    Chen, Peng
    Chen, Yunji
    Zhou, Xuehai
    Cheung, Ray C. C.
    IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (05) : 1296 - 1310
  • [32] Predictable Out-of-order Execution Using Virtual Traces
    Whitham, Jack
    Audsley, Neil
    RTSS: 2008 REAL-TIME SYSTEMS SYMPOSIUM, PROCEEDINGS, 2008, : 445 - 455
  • [33] CHECKPOINT REPAIR FOR HIGH-PERFORMANCE OUT-OF-ORDER EXECUTION MACHINES
    HWU, WMW
    PATT, YN
    IEEE TRANSACTIONS ON COMPUTERS, 1987, 36 (12) : 1496 - 1514
  • [34] Exploring the Performance Limits of Out-of-order Commit
    Alipour, Mehdi
    Carlson, Trevor E.
    Kaxiras, Stefanos
    ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, : 211 - 220
  • [35] Asynchronous multiple-issue on-chip bus with in-order/out-of-order completion
    Jung, EG
    Lee, JG
    Kwak, SH
    Jhang, KS
    Lee, JA
    Har, DS
    IEICE TRANSACTIONS ON ELECTRONICS, 2005, E88C (12): : 2395 - 2399
  • [36] OSIA: Out-of-order Scheduling for In-order Arriving in concurrent multi-path transfer
    Wang, Jingyu
    Liao, Jianxin
    Li, Tonghong
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2012, 35 (02) : 633 - 643
  • [37] Improving branch prediction and predicated execution in out-of-order processors
    Quinones, Eduardo
    Parcerisa, Joan-Manuel
    Gonzalez, Antonio
    THIRTEENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2007, : 75 - +
  • [38] Format verification of out-of-order execution using incremental flushing
    Skakkebæk, JU
    Jones, RB
    Dill, DL
    COMPUTER AIDED VERIFICATION, 1998, 1427 : 98 - 109
  • [39] Evaluation of Speculation in Out-of-Order Execution of Synchronous Dataflow Networks
    Baudisch, Daniel
    Schneider, Klaus
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2015, 43 (01) : 86 - 129
  • [40] Out-Of-Order Execution of Synchronous Data-Flow Networks
    Baudisch, Daniel
    Brandt, Jens
    Schneider, Klaus
    2012 INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS (SAMOS): ARCHITECTURES, MODELING AND SIMULATION, 2012, : 168 - 175