Automatic Core Specialization for AVX-512 Applications

被引：7

作者：

Gottschlag, Mathias ^{[1
]}

Brantsch, Peter ^{[1
]}

Bellosa, Frank ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, Karlsruhe, Germany

来源：

PROCEEDINGS OF THE 13TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE (SYSTOR 2020) | 2020年

关键词：

AVX-512; core specialization; dim silicon;

D O I：

10.1145/3383669.3398282

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Advanced Vector Extension (AVX) instructions operate on wide SIMD vectors. Due to the resulting high power consumption, recent Intel processors reduce their frequency when executing complex AVX2 and AVX-512 instructions. Following non-AVX code is slowed down by this frequency reduction in two situations: When it executes on the sibling hyperthread of the same core in parallel or - as restoring the non-AVX frequency is delayed - when it directly follows the AVX2/AVX-512 code. As a result, heterogeneous workloads consisting of AVX-512 and non-AVX code are frequently slowed down by 10% on average. In this work, we describe a method to mitigate the frequency reduction slowdown for workloads involving AVX-512 instructions in both situations. Our approach employs core specialization and partitions the CPU cores into AVX-512 cores and non-AVX-512 cores, and only the former execute AVX-512 instructions so that the impact of potential frequency reductions is limited to those cores. To migrate threads to AVX-512 cores, we configure the non-AVX-512 cores to raise an exception when executing AVX-512 instructions. We use a heuristic to determine when to migrate threads back to non-AVX-512 cores. Our approach is able to reduce the frequency reduction overhead by 70% for an assortment of common benchmarks.

引用

页码：25 / 35

页数：11

共 50 条

[41] FastModular Squaring with AVX512IFMA
Drucker, Nir
Gueron, Shay
16TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY-NEW GENERATIONS (ITNG 2019), 2019, 800 : 3 - 8
[42] Optimizing Dilithium Implementation with AVX2/-512
Xu, Runqing
He, Debiao
Luo, Min
Peng, Cong
Zeng, Xiangyong
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (06)
[43] CCF: An efficient SpMV storage format for AVX512 platforms
Almasri, Mohammad
Abu-Sufah, Walid
PARALLEL COMPUTING, 2020, 100
[44] Evolving AVX512 Parallel C Code Using GP
Langdon, William B.
Lorenz, Ronny
GENETIC PROGRAMMING, EUROGP 2019, 2019, 11451 : 245 - 261
[45] 基于AVX512的格密码高速并行实现
雷斗威
何德彪
罗敏
彭聪
计算机工程, 2024, 50 (02) : 15 - 24
[46] Accelerating Large Integer Multiplication Using Intel AVX-512IFMA
Edamatsu, Takuya
Takahashi, Daisuke
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING (ICA3PP 2019), PT I, 2020, 11944 : 60 - 74
[47] Fast Multiple Montgomery Multiplications Using Intel AVX-512IFMA Instructions
Takahashi, Daisuke
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT V, 2020, 12253 : 655 - 663
[48] Faster Implementation of Ideal Lattice-Based Cryptography Using AVX512
Lei, Douwei
He, Debiao
Peng, Cong
Luo, Min
Liu, Zhe
Huang, Xinyi
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2023, 22 (05)
[49] The RACECAR Heuristic for Automatic Function Specialization on Multi-core Heterogeneous Systems
Wernsing, John Robert
Stitt, Greg
Fowers, Jeremy
CASES'12: PROCEEDINGS OF THE 2012 ACM INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURES AND SYNTHESIS FOR EMBEDDED SYSTEMS, 2012, : 81 - 90
[50] RACECAR: A Heuristic for Automatic Function Specialization on Multi-core Heterogeneous Systems
Wernsing, John R.
Stitt, Greg
ACM SIGPLAN NOTICES, 2012, 47 (08) : 321 - 322

← 1 2 3 4 5 →