Optimizing parallel GEMM routines using auto-tuning with Intel AVX-512

被引:16
|
作者
Kim, Raehyun [1 ]
Choi, Jaeyoung [1 ]
Lee, Myungho [2 ]
机构
[1] Soongsil Univ, Seoul, South Korea
[2] Myongji Univ, Yongin, Gyeonggi, South Korea
关键词
Manycore; Intel Xeon; Intel Xeon Phi; Autotuning; matrix-matrix multiplication; AVX-512;
D O I
10.1145/3293320.3293334
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents the optimal implementations of single-and double-precision general matrix-matrix multiplication (GEMM) routines for the Intel Xeon Phi Processor code-named Knights Landing (KNL) and the Intel Xeon Scalable Processors based on an auto-tuning approach with the Intel AVX-512 intrinsic functions. Our auto-tuning approach precisely determines the parameters reflecting the target architectural features. Our approach significantly reduces the search space and derives optimal parameter sets including the size of submatrices, prefetch distances, loop unrolling depth, and parallelization scheme. Without a single line of assembly code, our GEMM kernels show the comparable performance results to the Intel MKL and outperform other open-source BLAS libraries.
引用
收藏
页码:101 / 110
页数:10
相关论文
共 50 条
  • [1] An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions
    Takahashi, Daisuke
    COMPUTER ALGEBRA IN SCIENTIFIC COMPUTING (CASC 2022), 2022, 13366 : 318 - 332
  • [2] Hadamard Transform Improvement for HEVC using Intel AVX-512
    Sing, Jackson Teh Ka
    Sheikh, Usman Ullah
    Mokji, Musa
    Alias, N. Ezaila
    2019 IEEE 9TH SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE), 2019, : 310 - 315
  • [3] AVX512Crypto: Parallel Implementations of Korean Block Ciphers Using AVX-512
    Choi, Yongryeol
    Choi, Hojin
    Seo, Seog Chung
    IEEE ACCESS, 2023, 11 : 55094 - 55106
  • [4] Enhanced Vector Math Support on the Intel®AVX-512 Architecture
    Anderson, Cristina S.
    Zhang, Jingwei
    Cornea, Marius
    2018 IEEE 25TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2018, : 120 - 124
  • [5] Fast Multiple-Precision Integer Division Using Intel AVX-512
    Edamatsu, Takuya
    Takahashi, Daisuke
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (01) : 224 - 236
  • [6] Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processors
    Lim, Roktaek
    Lee, Yeongha
    Kim, Raehyun
    Choi, Jaeyoung
    Lee, Myungho
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (12): : 7895 - 7908
  • [7] Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processors
    Roktaek Lim
    Yeongha Lee
    Raehyun Kim
    Jaeyoung Choi
    Myungho Lee
    The Journal of Supercomputing, 2019, 75 : 7895 - 7908
  • [8] Acceleration of Large Integer Multiplication with Intel AVX-512 Instructions
    Edamatsu, Takuya
    Takahashi, Daisuke
    IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 211 - 218
  • [9] An implementation of matrix–matrix multiplication on the Intel KNL processor with AVX-512
    Roktaek Lim
    Yeongha Lee
    Raehyun Kim
    Jaeyoung Choi
    Cluster Computing, 2018, 21 : 1785 - 1795
  • [10] A Note on Auto-tuning GEMM for GPUs
    Li, Yinan
    Dongarra, Jack
    Tomov, Stanimire
    COMPUTATIONAL SCIENCE - ICCS 2009, PART I, 2009, 5544 : 884 - 892