Decoupled Vector Runahead for Prefetching Nested Memory-Access Chains

被引:0
|
作者
Naithani, Ajeya [1 ]
Roelandts, Jaime [1 ]
Ainsworth, Sam [2 ]
Jones, Timothy M. [3 ]
Eeckhout, Lieven [1 ]
机构
[1] Univ Ghent, B-9000 Ghent, Belgium
[2] Univ Edinburgh, Edinburgh EH8 9AB, Scotland
[3] Univ Cambridge, Comp Architecture & Compilat, Cambridge CB3 OFD, England
基金
英国工程与自然科学研究理事会; 欧洲研究理事会;
关键词
Prefetching; Vectors; Out of order; Parallel processing; Microarchitecture; Hardware; Tracking loops;
D O I
10.1109/MM.2024.3406891
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Decoupled vector runahead (DVR) exploits massive amounts of memory-level parallelism to improve the performance of applications that feature indirect memory accesses by dynamically inferring loop bounds at runtime, recognizing striding loads, and speculatively vectorizing the subsequent instructions that are part of an indirect chain. DVR runs as an on-demand, speculative, in-order, lightweight hardware subthread alongside the main thread within the core. DVR incurs minimal hardware overhead while delivering a substantial performance boost.
引用
收藏
页码:20 / 26
页数:7
相关论文
共 50 条
  • [1] Decoupled Vector Runahead
    Naithani, Ajeya
    Roelandts, Jaime
    Ainsworth, Sam
    Jones, Timothy M.
    Eeckhout, Lieven
    [J]. 56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023, 2023, : 17 - 31
  • [2] Relieving the memory-access bottleneck
    Melchior, Tim
    [J]. Electronic Products (Garden City, New York), 2000, 43 (07):
  • [3] THE WEAKEST MEMORY-ACCESS ORDER
    BITAR, P
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1992, 15 (04) : 305 - 331
  • [4] Vector Runahead for Indirect Memory Accesses
    Naithani, Ajeya
    Ainsworth, Sam
    Jones, Timothy M.
    Eeckhout, Lieven
    [J]. IEEE MICRO, 2022, 42 (04) : 116 - 123
  • [5] Relieving the memory-access bottleneck
    Melchior, T
    [J]. ELECTRONIC PRODUCTS MAGAZINE, 2000, 43 (05): : 57 - 59
  • [6] Vector Memory-Access Shuffle Fused Instructions for FFT-Like Algorithms
    Liu Sheng
    Yuan Bo
    Guo Yang
    Sun Haiyan
    Jiang Zekun
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (05) : 1077 - 1088
  • [7] Vector Memory-Access Shuffle Fused Instructions for FFT-Like Algorithms
    LIU Sheng
    YUAN Bo
    GUO Yang
    SUN Haiyan
    JIANG Zekun
    [J]. Chinese Journal of Electronics, 2023, 32 (05) : 1077 - 1088
  • [8] A comparison of data prefetching on an access decoupled and superscalar machine
    Jones, GP
    Topham, NP
    [J]. THIRTIETH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, 1997, : 65 - 70
  • [9] Classifying Memory Access Patterns for Prefetching
    Ayers, Grant
    Litz, Heiner
    Kozyrakis, Christos
    Ranganathan, Parthasarathy
    [J]. TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), 2020, : 513 - 526
  • [10] Mod (2P-1) Shuffle Memory-Access Instructions for FFTs on Vector SIMD DSPs
    Liu, Sheng
    Chen, Haiyan
    Wan, Jianghua
    Wang, Yaohua
    [J]. 2016 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2016, : 426 - 430