Automatic Core Specialization for AVX-512 Applications

被引：7

作者：

Gottschlag, Mathias ^{[1
]}

Brantsch, Peter ^{[1
]}

Bellosa, Frank ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, Karlsruhe, Germany

来源：

PROCEEDINGS OF THE 13TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE (SYSTOR 2020) | 2020年

关键词：

AVX-512; core specialization; dim silicon;

D O I：

10.1145/3383669.3398282

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Advanced Vector Extension (AVX) instructions operate on wide SIMD vectors. Due to the resulting high power consumption, recent Intel processors reduce their frequency when executing complex AVX2 and AVX-512 instructions. Following non-AVX code is slowed down by this frequency reduction in two situations: When it executes on the sibling hyperthread of the same core in parallel or - as restoring the non-AVX frequency is delayed - when it directly follows the AVX2/AVX-512 code. As a result, heterogeneous workloads consisting of AVX-512 and non-AVX code are frequently slowed down by 10% on average. In this work, we describe a method to mitigate the frequency reduction slowdown for workloads involving AVX-512 instructions in both situations. Our approach employs core specialization and partitions the CPU cores into AVX-512 cores and non-AVX-512 cores, and only the former execute AVX-512 instructions so that the impact of potential frequency reductions is limited to those cores. To migrate threads to AVX-512 cores, we configure the non-AVX-512 cores to raise an exception when executing AVX-512 instructions. We use a heuristic to determine when to migrate threads back to non-AVX-512 cores. Our approach is able to reduce the frequency reduction overhead by 70% for an assortment of common benchmarks.

引用

页码：25 / 35

页数：11

共 50 条

[31] Optimization of a sparse grid-based data mining kernel for architectures using AVX-512
Sarbu, Paul-Cristian
Bungartz, Hans-Joachim
2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 364 - 371
[32] Optimization of the N-Body Simulation on Intel's Architectures Based on AVX-512 Instruction Set
Rucci, Enzo
Moreno, Ezequiel
Pousa, Adrian
Chichizola, Franco
COMPUTER SCIENCE - CACIC 2019, 2020, 1184 : 37 - 52
[33] Fused Table Scans: Combining AVX-512 and JIT to Double the Performance of Multi-Predicate Scans
Dreseler, Markus
Kossmann, Jan
Frohnhofen, Johannes
Uflacker, Matthias
Plattner, Hasso
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2018, : 102 - 109
[34] Conflict Detection-based Run-Length Encoding - AVX-512 CD Instruction Set in Action
Ungethuem, Annett
Pietrzyk, Johannes
Damme, Patrick
Habich, Dirk
Lehner, Wolfgang
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2018, : 96 - 101
[35] SWIMM 2.0: Enhanced Smith–Waterman on Intel’s Multicore and Manycore Architectures Based on AVX-512 Vector Extensions
Enzo Rucci
Carlos Garcia Sanchez
Guillermo Botella Juan
Armando De Giusti
Marcelo Naiouf
Manuel Prieto-Matias
International Journal of Parallel Programming, 2019, 47 : 296 - 316
[36] SPC5: an efficient SpMV framework vectorized using ARM SVE and x86 AVX-512
Regnault, Evann
Bramas, Berenger
COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2024, 21 (01) : 203 - 221
[37] SWIMM 2.0: Enhanced Smith-Waterman on Intel's Multicore and Manycore Architectures Based on AVX-512 Vector Extensions
Rucci, Enzo
Garcia Sanchez, Carlos
Botella Juan, Guillermo
De Giusti, Armando
Naiouf, Marcelo
Prieto-Matias, Manuel
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2019, 47 (02) : 296 - 316
[38] Improving blocked matrix-matrix multiplication routine by utilizing AVX-512 instructions on intel knights landing and xeon scalable processors
Park, Yoosang
Kim, Raehyun
Nguyen, Thi My Tuyen
Choi, Jaeyoung
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2023, 26 (05): : 2539 - 2549
[39] Improving blocked matrix-matrix multiplication routine by utilizing AVX-512 instructions on intel knights landing and xeon scalable processors
Yoosang Park
Raehyun Kim
Thi My Tuyen Nguyen
Jaeyoung Choi
Cluster Computing, 2023, 26 : 2539 - 2549
[40] Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions
Bramas, Berenger
Kus, Pavel
PEERJ COMPUTER SCIENCE, 2018,

← 1 2 3 4 5 →