Improving the Utilization of Micro-operation Caches in x86 Processors

被引:6
|
作者
Kotra, Jagadish B. [1 ]
Kalamatianos, John [1 ]
机构
[1] AMD Res, Austin, TX 78735 USA
关键词
Micro-operations Cache; CPU front-end; CISC; X86; Micro-ops; COMPRESSION;
D O I
10.1109/MICRO50266.2020.00025
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Most modern processors employ variable length, Complex Instruction Set Computing (CISC) instructions to reduce instruction fetch energy cost and bandwidth requirements. High throughput decoding of CISC instructions requires energy hungry logic for instruction identification. Efficient CISC instruction execution motivated mapping them to fixed length micro-operations (also known as uops). To reduce costly decoder activity, commercial CISC processors employ a micro-operations cache (uop cache) that caches uop sequences, bypassing the decoder. Uop cache's benefits are: (1) shorter pipeline length for uops dispatched by the uop cache, (2) lower decoder energy consumption, and, (3) earlier detection of mispredicted branches. In this paper, we observe that a uop cache can be heavily fragmented under certain uop cache entry construction rules. Based on this observation, we propose two complementary optimizations to address fragmentation: Cache Line boundary AgnoStic uoP cache design (CLASP) and uop cache compaction. CLASP addresses the internal fragmentation caused by short, sequential uop sequences, terminated at the I-cache line boundary, by fusing them into a single uop cache entry. Compaction further lowers fragmentation by placing to the same uop cache entry temporally correlated, non-sequential uop sequences mapped to the same uop cache set. Our experiments on a x86 simulator using a wide variety of benchmarks show that CLASP improves performance up to 5.6% and lowers decoder power up to 19.63%. When CLASP is coupled with the most aggressive compaction variant, performance improves by up to 12.8% and decoder power savings are up to 31.53%.
引用
收藏
页码:160 / 172
页数:13
相关论文
共 45 条
  • [31] Design and Implementation of 2D Convolution on x86/x64 Processors
    Kelefouras, Vasilios
    Keramidas, Georgios
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3800 - 3815
  • [32] Undocumented x86 instructions to control the CPU at the microarchitecture level in modern Intel processors
    Mark Ermolov
    Dmitry Sklyarov
    Maxim Goryachy
    Journal of Computer Virology and Hacking Techniques, 2023, 19 : 351 - 365
  • [33] Wake-up latencies for processor idle states on current x86 processors
    Schoene, Robert
    Molka, Daniel
    Werner, Michael
    COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2015, 30 (02): : 219 - 227
  • [34] Expediting Design Bug Discovery in Regressions of x86 processors Using Machine Learning
    Wahba, Ahmed
    Hohnerlein, Justin
    Rahman, Farhan
    2019 20TH INTERNATIONAL WORKSHOP ON MICROPROCESSOR/SOC TEST, SECURITY AND VERIFICATION (MTV 2019), 2019, : 1 - 6
  • [35] Undocumented x86 instructions to control the CPU at the microarchitecture level in modern Intel processors
    Ermolov, Mark
    Sklyarov, Dmitry
    Goryachy, Maxim
    JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2023, 19 (03) : 351 - 365
  • [36] TacVar: Tackling Variability in Short-Interval Timing Measurements on X86 Processors
    Liao, Qiucheng
    Lin, James
    2024 IEEE 24TH INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID 2024, 2024, : 496 - 506
  • [37] A Group-Commit Mechanism for ROB-Based Processors Implementing the X86 ISA
    Afram, Furat
    Zeng, Hui
    Ghose, Kanad
    19TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA2013), 2013, : 47 - 58
  • [38] A High-Coverage and Efficient Instruction-Level Testing Approach for x86 Processors
    Wang, Guang
    Zhu, Ziyuan
    Cheng, Xu
    Meng, Dan
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (11) : 3203 - 3217
  • [39] X86 processors: Intel's rivals ready to exploit P6 weakness
    1600, McGraw-Hill Inc, Peterborough, NH, USA (20):
  • [40] Micro-Sector Cache: Improving Space Utilization in Sectored DRAM Caches
    Chaudhuri, Mainak
    Agrawal, Mukesh
    Gaur, Jayesh
    Subramoney, Sreenivas
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (01)