Optimizing parallel GEMM routines using auto-tuning with Intel AVX-512

被引：16

作者：

Kim, Raehyun ^{[1
]}

Choi, Jaeyoung ^{[1
]}

Lee, Myungho ^{[2
]}

机构：

[1] Soongsil Univ, Seoul, South Korea

[2] Myongji Univ, Yongin, Gyeonggi, South Korea

来源：

PROCEEDINGS OF INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION (HPC ASIA 2019) | 2019年

关键词：

Manycore; Intel Xeon; Intel Xeon Phi; Autotuning; matrix-matrix multiplication; AVX-512;

D O I：

10.1145/3293320.3293334

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper presents the optimal implementations of single-and double-precision general matrix-matrix multiplication (GEMM) routines for the Intel Xeon Phi Processor code-named Knights Landing (KNL) and the Intel Xeon Scalable Processors based on an auto-tuning approach with the Intel AVX-512 intrinsic functions. Our auto-tuning approach precisely determines the parameters reflecting the target architectural features. Our approach significantly reduces the search space and derives optimal parameter sets including the size of submatrices, prefetch distances, loop unrolling depth, and parallelization scheme. Without a single line of assembly code, our GEMM kernels show the comparable performance results to the Intel MKL and outperform other open-source BLAS libraries.

引用

页码：101 / 110

页数：10

共 50 条

[1] An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions
Takahashi, Daisuke
COMPUTER ALGEBRA IN SCIENTIFIC COMPUTING (CASC 2022), 2022, 13366 : 318 - 332
[2] Hadamard Transform Improvement for HEVC using Intel AVX-512
Sing, Jackson Teh Ka
Sheikh, Usman Ullah
Mokji, Musa
Alias, N. Ezaila
2019 IEEE 9TH SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE), 2019, : 310 - 315
[3] AVX512Crypto: Parallel Implementations of Korean Block Ciphers Using AVX-512
Choi, Yongryeol
Choi, Hojin
Seo, Seog Chung
IEEE ACCESS, 2023, 11 : 55094 - 55106
[4] Enhanced Vector Math Support on the Intel®AVX-512 Architecture
Anderson, Cristina S.
Zhang, Jingwei
Cornea, Marius
2018 IEEE 25TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2018, : 120 - 124
[5] Fast Multiple-Precision Integer Division Using Intel AVX-512
Edamatsu, Takuya
Takahashi, Daisuke
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (01) : 224 - 236
[6] Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processors
Lim, Roktaek
Lee, Yeongha
Kim, Raehyun
Choi, Jaeyoung
Lee, Myungho
JOURNAL OF SUPERCOMPUTING, 2019, 75 (12): : 7895 - 7908
[7] Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processors
Roktaek Lim
Yeongha Lee
Raehyun Kim
Jaeyoung Choi
Myungho Lee
The Journal of Supercomputing, 2019, 75 : 7895 - 7908
[8] Acceleration of Large Integer Multiplication with Intel AVX-512 Instructions
Edamatsu, Takuya
Takahashi, Daisuke
IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 211 - 218
[9] An implementation of matrix–matrix multiplication on the Intel KNL processor with AVX-512
Roktaek Lim
Yeongha Lee
Raehyun Kim
Jaeyoung Choi
Cluster Computing, 2018, 21 : 1785 - 1795
[10] A Note on Auto-tuning GEMM for GPUs
Li, Yinan
Dongarra, Jack
Tomov, Stanimire
COMPUTATIONAL SCIENCE - ICCS 2009, PART I, 2009, 5544 : 884 - 892

← 1 2 3 4 5 →