Fast Kronecker Matrix-Matrix Multiplication on GPUs

被引：0

作者：

Jangda, Abhinav ^{[1
]}

Yadav, Mohit ^{[2
]}

机构：

[1] Microsoft Res, Redmond, WA 98052 USA

[2] Univ Massachusetts, Amherst, MA 01003 USA

来源：

PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024 | 2024年

关键词：

Graphics Processing Units; CUDA; Kronecker; Product; Linear Algebra; TENSOR CONTRACTION; CPU;

D O I：

10.1145/3627535.3638489

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Kronecker Matrix-Matrix Multiplication (Kron-Matmul) is the multiplication of a matrix with the Kronecker Product of several smaller matrices. Kron-Matmul is a core operation for many scientific and machine learning computations. State-of-the-art Kron-Matmul implementations utilize existing tensor algebra operations, such as matrix multiplication, transpose, and tensor matrix multiplication. However, this design choice prevents several Kron-Matmul specific optimizations, thus, leaving significant performance on the table. To address this issue, we present FastKron, an efficient technique for Kron-Matmul on single and multiple GPUs. FastKron is independent of linear algebra operations enabling several new optimizations for Kron-Matmul. Thus, it performs up to 40.7x and 7.85x faster than existing implementations on 1 and 16 GPUs respectively.

引用

下载

页码：390 / 403

页数：14

共 50 条

[1] Efficient Symmetric Band Matrix-Matrix Multiplication on GPUs
Dufrechou, Ernesto
Ezzatti, Pablo
Quintana-Orti, Enrique S.
Remon, Alfredo
HIGH PERFORMANCE COMPUTING, CARLA 2014, 2014, 485 : 1 - 12
[2] Matrix-Matrix Multiplication Using Multiple GPUs Connected by Nvlink
Choi, Yea Rem
Nikolskiy, Vsevolod
Stegailov, Vladimir
2020 GLOBAL SMART INDUSTRY CONFERENCE (GLOSIC), 2020, : 354 - 361
[3] A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors
Liu, Weifeng
Vinter, Brian
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 85 : 47 - 61
[4] Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs
Wei, Bingxin
Wang, Yizhuo
Chang, Fangli
Gao, Jianhua
Ji, Weixing
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2024, 38 (03): : 245 - 259
[5] Register-based Implementation of the Sparse General Matrix-Matrix Multiplication on GPUs
Liu, Junhong
He, Xin
Liu, Weifeng
Tan, Guangming
ACM SIGPLAN NOTICES, 2018, 53 (01) : 407 - 408
[6] TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs
Niu, Yuyao
Lu, Zhengyang
Ji, Haonan
Song, Shuhui
Jin, Zhou
Liu, Weifeng
PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 90 - 106
[7] TSM2: Optimizing Tall-and-Skinny Matrix-Matrix Multiplication on GPUs
Chen, Jieyang
Xiong, Nan
Liang, Xin
Tao, Dingwen
Li, Sihuan
Ouyang, Kaiming
Zhao, Kai
DeBardeleben, Nathan
Guan, Qiang
Chen, Zizhong
INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2019), 2019, : 106 - 116
[8] Matrix-matrix multiplication on heterogeneous platforms
Beaumont, O
Boudet, V
Rastello, F
Robert, Y
2000 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2000, : 289 - 298
[9] PERFORMANCE EVALUATION OF SPARSE MATRIX-MATRIX MULTIPLICATION
Jain-Mendon, Shweta
Sass, Ron
2013 23RD INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2013) PROCEEDINGS, 2013,
[10] Hypergraph partitioning for sparse matrix-matrix multiplication
Ballard G.
Druinsky A.
Knight N.
Schwartz O.
ACM Transactions on Parallel Computing, 2016, 3 (03) : 1 - 34

← 1 2 3 4 5 →