Matrix-Matrix Multiplication Using Multiple GPUs Connected by Nvlink

被引：0

作者：

Choi, Yea Rem ^{[1
]}

Nikolskiy, Vsevolod ^{[1
]}

Stegailov, Vladimir ^{[2
]}

机构：

[1] Natl Res Univ Higher Sch Econ, Moscow, Russia

[2] Russian Acad Sci, Joint Inst High Temp, Dolgoprudnyi, Russia

来源：

2020 GLOBAL SMART INDUSTRY CONFERENCE (GLOSIC) | 2020年

关键词：

parallel computing; CUDA; GEMM; high-speed GPU interconnect;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In this work we present an original GPU-only parallel matrix-matrix multiplication algorithm (C = alpha A * B + beta C) for servers with multiple GPUs connected by NVLink. The algorithm is implemented using CUDA. The data transfer patterns, the communication and computation overlap, and the overall performance of the algorithm are considered. By regulating the commands call order and the sizes of tiles, we tune the uninterrupted asynchronous data transmission and kernel execution. Two cases are considered: when all the data are stored in one GPU and when the matrices are distributed among several GPUs. The execution efficiency of this new algorithm is compared with cuBLAS-XT from the Nvidia CUDA Toolkit library.

引用

页码：354 / 361

页数：8

共 50 条

[31] Scaling sparse matrix-matrix multiplication in the accumulo database
Gunduz Vehbi Demirci
Cevdet Aykanat
Distributed and Parallel Databases, 2020, 38 : 31 - 62
[32] Scalability analysis of matrix-matrix multiplication on heterogeneous clusters
Kalinov, A
ISPDC 2004: THIRD INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING/HETEROPAR '04: THIRD INTERNATIONAL WORKSHOP ON ALGORITHMS, MODELS AND TOOLS FOR PARALLEL COMPUTING ON HETEROGENEOUS NETWORKS, PROCEEDINGS, 2004, : 303 - 309
[33] Using Static Allocation Algorithms for Matrix Matrix Multiplication on Multicores and GPUs
Eyraud-Dubois, Lionel
Lambert, Thomas
PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2018,
[34] Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms
Patwary, Md. Mostofa Ali
Satish, Nadathur Rajagopalan
Sundaram, Narayanan
Park, Jongsoo
Anderson, Michael J.
Vadlamudi, Satya Gautam
Das, Dipankar
Pudov, Sergey G.
Pirogov, Vadim O.
Dubey, Pradeep
HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2015, 2015, 9137 : 48 - 57
[35] Design space exploration for sparse matrix-matrix multiplication on FPGAs
Lin, Colin Yu
Wong, Ngai
So, Hayden Kwok-Hay
INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2013, 41 (02) : 205 - 219
[36] PARALLEL SPARSE MATRIX-MATRIX MULTIPLICATION AND INDEXING: IMPLEMENTATION AND EXPERIMENTS
Buluc, Aydin
Gilbert, John R.
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04): : C170 - C191
[37] Matrix-Matrix Multiplication on a Large Register File Architecture with Indirection
Sreedhar, Dheeraj
Derby, J. H.
Montoye, R. K.
Johnson, C. L.
2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2014,
[38] Accelerating sparse matrix-matrix multiplication with GPU Tensor Cores
Zachariadis, Orestis
Satpute, Nitin
Gomez-Luna, Juan
Olivares, Joaquin
COMPUTERS & ELECTRICAL ENGINEERING, 2020, 88 (88)
[39] Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
Moon, Gordon Euhyun
Kwon, Hyoukjun
Jeong, Geonhwa
Chatarasi, Prasanth
Rajamanickam, Sivasankaran
Krishna, Tushar
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (04) : 1002 - 1014
[40] On-line soft error correction in matrix-matrix multiplication
Wu, Panruo
Ding, Chong
Chen, Longxiang
Davies, Teresa
Karlsson, Christer
Chen, Zizhong
JOURNAL OF COMPUTATIONAL SCIENCE, 2013, 4 (06) : 465 - 472

← 1 2 3 4 5 →