Matrix-Matrix Multiplication Using Multiple GPUs Connected by Nvlink

被引：0

作者：

Choi, Yea Rem ^{[1
]}

Nikolskiy, Vsevolod ^{[1
]}

Stegailov, Vladimir ^{[2
]}

机构：

[1] Natl Res Univ Higher Sch Econ, Moscow, Russia

[2] Russian Acad Sci, Joint Inst High Temp, Dolgoprudnyi, Russia

来源：

2020 GLOBAL SMART INDUSTRY CONFERENCE (GLOSIC) | 2020年

关键词：

parallel computing; CUDA; GEMM; high-speed GPU interconnect;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In this work we present an original GPU-only parallel matrix-matrix multiplication algorithm (C = alpha A * B + beta C) for servers with multiple GPUs connected by NVLink. The algorithm is implemented using CUDA. The data transfer patterns, the communication and computation overlap, and the overall performance of the algorithm are considered. By regulating the commands call order and the sizes of tiles, we tune the uninterrupted asynchronous data transmission and kernel execution. Two cases are considered: when all the data are stored in one GPU and when the matrices are distributed among several GPUs. The execution efficiency of this new algorithm is compared with cuBLAS-XT from the Nvidia CUDA Toolkit library.

引用

下载

页码：354 / 361

页数：8

共 50 条

[21] GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks
Huang, Guyue
Dai, Guohao
Wang, Yu
Yang, Huazhong
PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
[22] TSM2X: High-performance tall-and-skinny matrix-matrix multiplication on GPUs
Rivera, Cody
Chen, Jieyang
Xiong, Nan
Zhang, Jing
Song, Shuaiwen Leon
Tao, Dingwen
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 151 : 70 - 85
[23] A data locality methodology for matrix-matrix multiplication algorithm
Alachiotis, Nicolaos
Kelefouras, Vasileios I.
Athanasiou, George S.
Michail, Harris E.
Kritikakou, Angeliki S.
Goutis, Costas E.
JOURNAL OF SUPERCOMPUTING, 2012, 59 (02): : 830 - 851
[24] Fountain Codes for Private Distributed Matrix-Matrix Multiplication
Bitar, Rawad
Xhemrishi, Marvin
Wachter-Zeh, Antonia
PROCEEDINGS OF 2020 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA2020), 2020, : 480 - 484
[25] Optimizing sparse general matrix-matrix multiplication for DCUs
Guo, Hengliang
Wang, Haolei
Chen, Wanting
Zhang, Congxiang
Han, Yubo
Zhu, Shengguang
Zhang, Dujuan
Guo, Yang
Shang, Jiandong
Wan, Tao
Li, Qingyang
Wu, Gang
JOURNAL OF SUPERCOMPUTING, 2024, 80 (14): : 20176 - 20200
[26] A Systematic Survey of General Sparse Matrix-matrix Multiplication
Gao, Jianhua
Ji, Weixing
Chang, Fangli
Han, Shiyu
Wei, Bingxin
Liu, Zeming
Wang, Yizhuo
ACM COMPUTING SURVEYS, 2023, 55 (12)
[27] Automating Structured Matrix-Matrix Multiplication for Stream Processing
Koehn, Thaddeus
Athanas, Peter
2016 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG16), 2016,
[28] Hierarchical matrix-matrix multiplication based on multiprocessor tasks
Hunold, S
Rauber, T
Rünger, G
COMPUTATIONAL SCIENCE - ICCS 2004, PT 2, PROCEEDINGS, 2004, 3037 : 1 - 8
[29] Parallel photonic acceleration processor for matrix-matrix multiplication
Huang, Ying
Yue, Hengsong
Ma, Wei
Zhang, Yiyuan
Xiao, Yao
Tang, Yong
Tang, He
Chu, Tao
OPTICS LETTERS, 2023, 48 (12) : 3231 - 3234
[30] Scaling sparse matrix-matrix multiplication in the accumulo database
Demirci, Gunduz Vehbi
Aykanat, Cevdet
DISTRIBUTED AND PARALLEL DATABASES, 2020, 38 (01) : 31 - 62

← 1 2 3 4 5 →