Automatic Core Specialization for AVX-512 Applications

被引:7
|
作者
Gottschlag, Mathias [1 ]
Brantsch, Peter [1 ]
Bellosa, Frank [1 ]
机构
[1] Karlsruhe Inst Technol, Karlsruhe, Germany
关键词
AVX-512; core specialization; dim silicon;
D O I
10.1145/3383669.3398282
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Advanced Vector Extension (AVX) instructions operate on wide SIMD vectors. Due to the resulting high power consumption, recent Intel processors reduce their frequency when executing complex AVX2 and AVX-512 instructions. Following non-AVX code is slowed down by this frequency reduction in two situations: When it executes on the sibling hyperthread of the same core in parallel or - as restoring the non-AVX frequency is delayed - when it directly follows the AVX2/AVX-512 code. As a result, heterogeneous workloads consisting of AVX-512 and non-AVX code are frequently slowed down by 10% on average. In this work, we describe a method to mitigate the frequency reduction slowdown for workloads involving AVX-512 instructions in both situations. Our approach employs core specialization and partitions the CPU cores into AVX-512 cores and non-AVX-512 cores, and only the former execute AVX-512 instructions so that the impact of potential frequency reductions is limited to those cores. To migrate threads to AVX-512 cores, we configure the non-AVX-512 cores to raise an exception when executing AVX-512 instructions. We use a heuristic to determine when to migrate threads back to non-AVX-512 cores. Our approach is able to reduce the frequency reduction overhead by 70% for an assortment of common benchmarks.
引用
收藏
页码:25 / 35
页数:11
相关论文
共 50 条
  • [31] Optimization of a sparse grid-based data mining kernel for architectures using AVX-512
    Sarbu, Paul-Cristian
    Bungartz, Hans-Joachim
    2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 364 - 371
  • [32] Optimization of the N-Body Simulation on Intel's Architectures Based on AVX-512 Instruction Set
    Rucci, Enzo
    Moreno, Ezequiel
    Pousa, Adrian
    Chichizola, Franco
    COMPUTER SCIENCE - CACIC 2019, 2020, 1184 : 37 - 52
  • [33] Fused Table Scans: Combining AVX-512 and JIT to Double the Performance of Multi-Predicate Scans
    Dreseler, Markus
    Kossmann, Jan
    Frohnhofen, Johannes
    Uflacker, Matthias
    Plattner, Hasso
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2018, : 102 - 109
  • [34] Conflict Detection-based Run-Length Encoding - AVX-512 CD Instruction Set in Action
    Ungethuem, Annett
    Pietrzyk, Johannes
    Damme, Patrick
    Habich, Dirk
    Lehner, Wolfgang
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2018, : 96 - 101
  • [35] SWIMM 2.0: Enhanced Smith–Waterman on Intel’s Multicore and Manycore Architectures Based on AVX-512 Vector Extensions
    Enzo Rucci
    Carlos Garcia Sanchez
    Guillermo Botella Juan
    Armando De Giusti
    Marcelo Naiouf
    Manuel Prieto-Matias
    International Journal of Parallel Programming, 2019, 47 : 296 - 316
  • [36] SPC5: an efficient SpMV framework vectorized using ARM SVE and x86 AVX-512
    Regnault, Evann
    Bramas, Berenger
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2024, 21 (01) : 203 - 221
  • [37] SWIMM 2.0: Enhanced Smith-Waterman on Intel's Multicore and Manycore Architectures Based on AVX-512 Vector Extensions
    Rucci, Enzo
    Garcia Sanchez, Carlos
    Botella Juan, Guillermo
    De Giusti, Armando
    Naiouf, Marcelo
    Prieto-Matias, Manuel
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2019, 47 (02) : 296 - 316
  • [38] Improving blocked matrix-matrix multiplication routine by utilizing AVX-512 instructions on intel knights landing and xeon scalable processors
    Park, Yoosang
    Kim, Raehyun
    Nguyen, Thi My Tuyen
    Choi, Jaeyoung
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2023, 26 (05): : 2539 - 2549
  • [39] Improving blocked matrix-matrix multiplication routine by utilizing AVX-512 instructions on intel knights landing and xeon scalable processors
    Yoosang Park
    Raehyun Kim
    Thi My Tuyen Nguyen
    Jaeyoung Choi
    Cluster Computing, 2023, 26 : 2539 - 2549
  • [40] Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions
    Bramas, Berenger
    Kus, Pavel
    PEERJ COMPUTER SCIENCE, 2018,