Matrix-Matrix Multiplication Using Multiple GPUs Connected by Nvlink

被引：0

作者：

Choi, Yea Rem ^{[1
]}

Nikolskiy, Vsevolod ^{[1
]}

Stegailov, Vladimir ^{[2
]}

机构：

[1] Natl Res Univ Higher Sch Econ, Moscow, Russia

[2] Russian Acad Sci, Joint Inst High Temp, Dolgoprudnyi, Russia

来源：

2020 GLOBAL SMART INDUSTRY CONFERENCE (GLOSIC) | 2020年

关键词：

parallel computing; CUDA; GEMM; high-speed GPU interconnect;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In this work we present an original GPU-only parallel matrix-matrix multiplication algorithm (C = alpha A * B + beta C) for servers with multiple GPUs connected by NVLink. The algorithm is implemented using CUDA. The data transfer patterns, the communication and computation overlap, and the overall performance of the algorithm are considered. By regulating the commands call order and the sizes of tiles, we tune the uninterrupted asynchronous data transmission and kernel execution. Two cases are considered: when all the data are stored in one GPU and when the matrices are distributed among several GPUs. The execution efficiency of this new algorithm is compared with cuBLAS-XT from the Nvidia CUDA Toolkit library.

引用

下载

页码：354 / 361

页数：8

共 50 条

[1] Fast Kronecker Matrix-Matrix Multiplication on GPUs
Jangda, Abhinav
Yadav, Mohit
PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024, 2024, : 390 - 403
[2] Efficient Symmetric Band Matrix-Matrix Multiplication on GPUs
Dufrechou, Ernesto
Ezzatti, Pablo
Quintana-Orti, Enrique S.
Remon, Alfredo
HIGH PERFORMANCE COMPUTING, CARLA 2014, 2014, 485 : 1 - 12
[3] A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors
Liu, Weifeng
Vinter, Brian
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 85 : 47 - 61
[4] Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs
Wei, Bingxin
Wang, Yizhuo
Chang, Fangli
Gao, Jianhua
Ji, Weixing
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2024, 38 (03): : 245 - 259
[5] Register-based Implementation of the Sparse General Matrix-Matrix Multiplication on GPUs
Liu, Junhong
He, Xin
Liu, Weifeng
Tan, Guangming
ACM SIGPLAN NOTICES, 2018, 53 (01) : 407 - 408
[6] TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs
Niu, Yuyao
Lu, Zhengyang
Ji, Haonan
Song, Shuhui
Jin, Zhou
Liu, Weifeng
PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 90 - 106
[7] Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format
Shi, Shaohuai
Wang, Qiang
Chu, Xiaowen
2020 IEEE 26TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2020, : 19 - 26
[8] EXPLOITING MULTIPLE LEVELS OF PARALLELISM IN SPARSE MATRIX-MATRIX MULTIPLICATION
Azad, Ariful
Ballard, Grey
Buluc, Aydin
Demmel, James
Grigori, Laura
Schwartz, Oded
Toledo, Sivan
Williams, Samuel
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2016, 38 (06): : C624 - C651
[9] TSM2: Optimizing Tall-and-Skinny Matrix-Matrix Multiplication on GPUs
Chen, Jieyang
Xiong, Nan
Liang, Xin
Tao, Dingwen
Li, Sihuan
Ouyang, Kaiming
Zhao, Kai
DeBardeleben, Nathan
Guan, Qiang
Chen, Zizhong
INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2019), 2019, : 106 - 116
[10] Matrix-matrix multiplication on heterogeneous platforms
Beaumont, O
Boudet, V
Rastello, F
Robert, Y
2000 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2000, : 289 - 298

← 1 2 3 4 5 →