Future execution: A hardware prefetching technique for chip multiprocessors

被引:0
|
作者
Ganusov, I [1 ]
Burtscher, M [1 ]
机构
[1] Cornell Univ, Comp Syst Lab, Ithaca, NY 14853 USA
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new hardware technique for using one core of a CMP to prefetch data for a thread running on another core. Our approach simply executes a copy of all non-control instructions in the prefetching core after they have executed in the primary core. On the way to the second core, each instruction's output is replaced by a prediction of the likely output that the nth future instance of this instruction will produce. Speculatively executing the resulting instruction stream on the second core issues load requests that the main program will probably reference in the future. Unlike previously proposed thread-based prefetching approaches, our technique does not need any thread spawning points, features an adjustable lookahead distance, does not require complicated analyzers to extract prefetching threads, is recovery-free, and necessitates no storage for the prefetching threads. We demonstrate that for the SPECcpu2000 benchmark suite, our mechanism Significantly increases the prefetching coverage and improves the primary core's performance by 10% on average over a baseline that already includes an aggressive hardware stream prefetcher We further show that our approach works well in combination with runahead execution.
引用
收藏
页码:350 / 360
页数:11
相关论文
共 50 条
  • [1] A hybrid hardware/software generated Prefetching Thread mechanism on Chip Multiprocessors
    Rui, Hou
    Zhang, Longbing
    Hu, Weiwu
    [J]. EURO-PAR 2006 PARALLEL PROCESSING, 2006, 4128 : 506 - 516
  • [2] MTB-Fetch: Multithreading Aware Hardware Prefetching for Chip Multiprocessors
    AlBarakat, Laith M.
    Gratz, Paul, V
    Jimenez, Daniel A.
    [J]. IEEE COMPUTER ARCHITECTURE LETTERS, 2018, 17 (02) : 175 - 178
  • [3] Interactions between compression and prefetching in chip multiprocessors
    Alameldeen, Alaa R.
    Wood, David A.
    [J]. THIRTEENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2007, : 228 - +
  • [4] Analyzing the Impact of Data Prefetching on Chip MultiProcessors
    Fukumoto, Naoto
    Mihara, Tomonobu
    Inoue, Koji
    Murakami, Kazuaki
    [J]. 2008 13th Asia-Pacific Computer Systems Architecture Conference, 2008, : 300 - 307
  • [5] Multiplexed Redundant Execution: A Technique for Efficient Fault Tolerance in Chip Multiprocessors
    Subramanyan, Pramod
    Singh, Virendra
    Saluja, Kewal K.
    Larsson, Erik
    [J]. 2010 DESIGN, AUTOMATION & TEST IN EUROPE (DATE 2010), 2010, : 1572 - 1577
  • [6] SEQUENTIAL HARDWARE PREFETCHING IN SHARED-MEMORY MULTIPROCESSORS
    DAHLGREN, F
    DUBOIS, M
    STENSTROM, P
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1995, 6 (07) : 733 - 746
  • [7] A Simple Activation/Deactivation Prefetching Scheme for Chip Multiprocessors
    Selfa, Vicent
    Gomez, Crispin
    Gomez, Maria E.
    Sahuquillo, Julio
    [J]. 2016 24TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP), 2016, : 143 - 150
  • [8] Future of Multiprocessors: Heterogeneous Chip Multiprocessors
    Qayum, Mohammad Abdul
    Siddique, Nafiul Alam
    Haque, Mohammad Atiqul
    Tayeen, Abu Saleh Md.
    [J]. 2012 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2012, : 372 - 376
  • [9] Adaptive Prefetching for Shared Cache Based Chip Multiprocessors
    Kandemir, Mahmut
    Zhang, Yuanrui
    Ozturk, Ozcan
    [J]. DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2009, : 773 - +
  • [10] Effective instruction prefetching in chip multiprocessors for modern commercial applications
    Spracklen, L
    Chou, Y
    Abraham, SG
    [J]. 11TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2005, : 225 - 236