Runahead execution: An alternative to very large instruction windows for out-of-order processors

被引:152
|
作者
Mutlu, O [1 ]
Stark, J [1 ]
Wilkerson, C [1 ]
Patt, YN [1 ]
机构
[1] Univ Texas, ECE Dept, Austin, TX 78712 USA
关键词
D O I
10.1109/HPCA.2003.1183532
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Today's high performance processors tolerate long latency operations by, means of out-of-order execution. However as latencies increase, the size of the instruction window must increase even faster if we are to continue to tolerate these latencies. We have already reached the point where the size of an instruction window that can handle these latencies is prohibitively large, in terms of both design complexity and power consumption. And, the problem is getting worse. This paper proposes runahead execution as an effective way to increase memory latency tolerance in an out-of-order processor, without requiring an unreasonably large instruction window. Runahead execution unblocks the instruction window blocked by long latency operations allowing the processor to execute far ahead in the program path. This results in data being prefetched into caches long before it is needed. On a machine model based on the Intel((R)) Pentium((R)) 4 processor having a 128-entry instruction window, adding runahead execution improves the IPC (Instructions Per Cycle) by 22% across a wide range of memory, intensive applications. Also, for the same machine model, runahead execution combined with a 128-entry window performs within 1% of a machine with no runahead execution and a 384-entry instruction window.
引用
收藏
页码:129 / 140
页数:12
相关论文
共 50 条
  • [41] Formal Verification of Out-of-Order Execution with Incremental Flushing
    Robert B. Jones
    Jens U. Skakkebæk
    David L. Dill
    Formal Methods in System Design, 2002, 20 : 139 - 158
  • [42] OVM: Out-of-order execution parallel Virtual Machine
    Bosilca, G
    Fedak, G
    Cappello, F
    FIRST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS, 2001, : 212 - 220
  • [43] On the correctness of hardware scheduling mechanisms for out-of-order execution
    Mueller, SM
    Paul, WJ
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 1998, 8 (02) : 301 - 314
  • [44] An out-of-order execution technique for runtime binary translators
    Le, BC
    ACM SIGPLAN NOTICES, 1998, 33 (11) : 151 - 158
  • [45] Architecture Support for Task Out-of-Order Execution in MPSoCs
    Wang, Chao
    Li, Xi
    Zhang, Junneng
    Chen, Peng
    Chen, Yunji
    Zhou, Xuehai
    Cheung, Ray C. C.
    IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (05) : 1296 - 1310
  • [46] Predictable Out-of-order Execution Using Virtual Traces
    Whitham, Jack
    Audsley, Neil
    RTSS: 2008 REAL-TIME SYSTEMS SYMPOSIUM, PROCEEDINGS, 2008, : 445 - 455
  • [47] Memory latency-tolerance approaches for itanium processors: Out-of-order execution vs. speculative precomputation
    Wang, PH
    Wang, H
    Collins, JD
    Grochowski, E
    Kling, RM
    Shen, JP
    EIGHTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2002, : 187 - 196
  • [48] Out-of-order instruction fetch using multiple sequencers
    Oberoi, P
    Sohi, G
    2002 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDING, 2002, : 14 - 23
  • [49] Computing Execution Times With Execution Decision Diagrams in the Presence of Out-of-Order Resources
    Bai, Zhenyu
    Casse, Hugues
    Carle, Thomas
    Rochange, Christine
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (11) : 3665 - 3678
  • [50] HAWS: Accelerating GPU Wavefront Execution through Selective Out-of-order Execution
    Gong, Xun
    Gong, Xiang
    Yu, Leiming
    Kaeli, David
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 16 (02)