Improving the Utilization of Micro-operation Caches in x86 Processors

被引:6
|
作者
Kotra, Jagadish B. [1 ]
Kalamatianos, John [1 ]
机构
[1] AMD Res, Austin, TX 78735 USA
关键词
Micro-operations Cache; CPU front-end; CISC; X86; Micro-ops; COMPRESSION;
D O I
10.1109/MICRO50266.2020.00025
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Most modern processors employ variable length, Complex Instruction Set Computing (CISC) instructions to reduce instruction fetch energy cost and bandwidth requirements. High throughput decoding of CISC instructions requires energy hungry logic for instruction identification. Efficient CISC instruction execution motivated mapping them to fixed length micro-operations (also known as uops). To reduce costly decoder activity, commercial CISC processors employ a micro-operations cache (uop cache) that caches uop sequences, bypassing the decoder. Uop cache's benefits are: (1) shorter pipeline length for uops dispatched by the uop cache, (2) lower decoder energy consumption, and, (3) earlier detection of mispredicted branches. In this paper, we observe that a uop cache can be heavily fragmented under certain uop cache entry construction rules. Based on this observation, we propose two complementary optimizations to address fragmentation: Cache Line boundary AgnoStic uoP cache design (CLASP) and uop cache compaction. CLASP addresses the internal fragmentation caused by short, sequential uop sequences, terminated at the I-cache line boundary, by fusing them into a single uop cache entry. Compaction further lowers fragmentation by placing to the same uop cache entry temporally correlated, non-sequential uop sequences mapped to the same uop cache set. Our experiments on a x86 simulator using a wide variety of benchmarks show that CLASP improves performance up to 5.6% and lowers decoder power up to 19.63%. When CLASP is coupled with the most aggressive compaction variant, performance improves by up to 12.8% and decoder power savings are up to 31.53%.
引用
收藏
页码:160 / 172
页数:13
相关论文
共 45 条
  • [21] Performance Implications of Extended Page Tables on Virtualized x86 Processors
    Merrifield, Timothy
    Taheri, H. Reza
    ACM SIGPLAN NOTICES, 2016, 51 (07) : 25 - 35
  • [22] Implementing Fast Packet Filters by Software Pipelining on x86 Processors
    Yamashita, Yoshiyuki
    Tsuru, Masato
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2009, 5737 : 420 - +
  • [23] Partially Redundant Fence Elimination for x86, ARM, and Power Processors
    Morisset, Robin
    Nardelli, Francesco Zappa
    CC'17: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION, 2017, : 1 - 10
  • [24] Efficient SIMD Optimization of HEVC Encoder over X86 Processors
    Chen, Keji
    Duan, Yizhou
    Yan, Leju
    Sun, Jun
    Guo, Zongming
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [25] Design of instruction stream buffer with trace support for x86 processors
    Chiu, JC
    Huang, IH
    Chung, CP
    2000 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS & PROCESSORS, PROCEEDINGS, 2000, : 294 - 299
  • [26] CoreRacer: A Practical Memory Race Recorder for Multicore x86 TSO Processors
    Pokam, Gilles
    Pereira, Cristiano
    Hu, Shiliang
    Adl-Tabatabai, Ali-Reza
    Gottschlich, Justin
    Ha, Jungwoo
    Wu, Youfeng
    PROCEEDINGS OF THE 2011 44TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 44), 2011, : 216 - 225
  • [27] Secure, Precise, and Fast Floating-Point Operations on x86 Processors
    Rane, Ashay
    Lin, Calvin
    Tiwari, Mohit
    PROCEEDINGS OF THE 25TH USENIX SECURITY SYMPOSIUM, 2016, : 71 - 86
  • [28] Trident: Harnessing Architectural Resources for All Page Sizes in x86 Processors
    Ram, Venkat Sri Sai
    Panwar, Ashish
    Basu, Arkaprava
    PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 1106 - 1120
  • [29] Retrofitting AMD x86 Processors with Active Virtual Machine Introspection Capabilities
    Dangl, Thomas
    Sentanoe, Stewart
    Reiser, Hans P.
    ARCHITECTURE OF COMPUTING SYSTEMS, ARCS 2023, 2023, 13949 : 168 - 182
  • [30] Security Vulnerability Analysis of the Vector Conditional Memory Instruction on x86 Processors
    Li D.-P.
    Zhu Z.-Y.
    Shi G.
    Meng D.
    Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (03): : 525 - 543