Automatic Core Specialization for AVX-512 Applications

被引：7

作者：

Gottschlag, Mathias ^{[1
]}

Brantsch, Peter ^{[1
]}

Bellosa, Frank ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, Karlsruhe, Germany

来源：

PROCEEDINGS OF THE 13TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE (SYSTOR 2020) | 2020年

关键词：

AVX-512; core specialization; dim silicon;

D O I：

10.1145/3383669.3398282

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Advanced Vector Extension (AVX) instructions operate on wide SIMD vectors. Due to the resulting high power consumption, recent Intel processors reduce their frequency when executing complex AVX2 and AVX-512 instructions. Following non-AVX code is slowed down by this frequency reduction in two situations: When it executes on the sibling hyperthread of the same core in parallel or - as restoring the non-AVX frequency is delayed - when it directly follows the AVX2/AVX-512 code. As a result, heterogeneous workloads consisting of AVX-512 and non-AVX code are frequently slowed down by 10% on average. In this work, we describe a method to mitigate the frequency reduction slowdown for workloads involving AVX-512 instructions in both situations. Our approach employs core specialization and partitions the CPU cores into AVX-512 cores and non-AVX-512 cores, and only the former execute AVX-512 instructions so that the impact of potential frequency reductions is limited to those cores. To migrate threads to AVX-512 cores, we configure the non-AVX-512 cores to raise an exception when executing AVX-512 instructions. We use a heuristic to determine when to migrate threads back to non-AVX-512 cores. Our approach is able to reduce the frequency reduction overhead by 70% for an assortment of common benchmarks.

引用

页码：25 / 35

页数：11

共 50 条

[21] Fast Multiple-Precision Integer Division Using Intel AVX-512
Edamatsu, Takuya
Takahashi, Daisuke
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (01) : 224 - 236
[22] An implementation of matrix-matrix multiplication on the Intel KNL processor with AVX-512
Lim, Roktaek
Lee, Yeongha
Kim, Raehyun
Choi, Jaeyoung
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2018, 21 (04): : 1785 - 1795
[23] Combining Algorithmic Rethinking and AVX-512 Intrinsics for Efficient Simulation of Subcellular Calcium Signaling
Jarvis, Chad
Lines, Glenn Terje
Langguth, Johannes
Nakajima, Kengo
Cai, Xing
COMPUTATIONAL SCIENCE - ICCS 2019, PT V, 2019, 11540 : 681 - 687
[24] A new AXT format for an efficient SpMV product using AVX-512 instructions and CUDA
Coronado-Barrientos, E.
Antonioletti, M.
Garcia-Loureiro, A.
ADVANCES IN ENGINEERING SOFTWARE, 2021, 156
[25] Vectorization of High-performance Scientific Calculations Using AVX-512 Intruction Set
B. M. Shabanov
A. A. Rybakov
S. S. Shumilin
Lobachevskii Journal of Mathematics, 2019, 40 : 580 - 598
[26] An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions
Takahashi, Daisuke
COMPUTER ALGEBRA IN SCIENTIFIC COMPUTING (CASC 2022), 2022, 13366 : 318 - 332
[27] Optimizing parallel GEMM routines using auto-tuning with Intel AVX-512
Kim, Raehyun
Choi, Jaeyoung
Lee, Myungho
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION (HPC ASIA 2019), 2019, : 101 - 110
[28] Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512
Zhang, Hong
Mills, Richard T.
Rupp, Karl
Smith, Barry F.
PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2018,
[29] Hydrogen-helium chemical and nuclear galaxy collision: Hydrodynamic simulations on AVX-512 supercomputers
Chernykh, Igor
Kulikov, Igor
Tutukov, Alexander
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2021, 391 (391)
[30] Vectorization of High-performance Scientific Calculations Using AVX-512 Intruction Set
Shabanov, B. M.
Rybakov, A. A.
Shumilin, S. S.
LOBACHEVSKII JOURNAL OF MATHEMATICS, 2019, 40 (05) : 580 - 598

← 1 2 3 4 5 →