Matrix-Matrix Multiplication Using Multiple GPUs Connected by Nvlink

被引：0

作者：

Choi, Yea Rem ^{[1
]}

Nikolskiy, Vsevolod ^{[1
]}

Stegailov, Vladimir ^{[2
]}

机构：

[1] Natl Res Univ Higher Sch Econ, Moscow, Russia

[2] Russian Acad Sci, Joint Inst High Temp, Dolgoprudnyi, Russia

来源：

2020 GLOBAL SMART INDUSTRY CONFERENCE (GLOSIC) | 2020年

关键词：

parallel computing; CUDA; GEMM; high-speed GPU interconnect;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In this work we present an original GPU-only parallel matrix-matrix multiplication algorithm (C = alpha A * B + beta C) for servers with multiple GPUs connected by NVLink. The algorithm is implemented using CUDA. The data transfer patterns, the communication and computation overlap, and the overall performance of the algorithm are considered. By regulating the commands call order and the sizes of tiles, we tune the uninterrupted asynchronous data transmission and kernel execution. Two cases are considered: when all the data are stored in one GPU and when the matrices are distributed among several GPUs. The execution efficiency of this new algorithm is compared with cuBLAS-XT from the Nvidia CUDA Toolkit library.

引用

页码：354 / 361

页数：8

共 50 条

[41] Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication
Akbudak, Kadir
Selvitopi, Oguz
Aykanat, Cevdet
ACM TRANSACTIONS ON PARALLEL COMPUTING, 2018, 4 (03)
[42] Parallel Algorithm for Quasi-Band Matrix-Matrix Multiplication
Vooturi, Dharma Teja
Kothapalli, Kishore
PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT I, 2016, 9573 : 106 - 115
[43] Fast Compressive Large-Scale Matrix-Matrix Multiplication Using Product Codes
Ocal, Orhan
Ramchandran, Kannan
2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 1426 - 1431
[44] On Large-Scale Matrix-Matrix Multiplication On Compressed Structures
Krishna, Sudhindra Gopal
Narasimhan, Aditya
Radhakrishnan, Sridhar
Veras, Richard
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 2976 - 2985
[45] A Matrix-Matrix Multiplication methodology for single/multi-core architectures using SIMD
Kelefouras, Vasilios
Kritikakou, Angeliki
Goutis, Costas
JOURNAL OF SUPERCOMPUTING, 2014, 68 (03): : 1418 - 1440
[46] Sparse approximate matrix-matrix multiplication for density matrix purification with error control
Artemov, Anton G.
Rubensson, Emanuel H.
JOURNAL OF COMPUTATIONAL PHYSICS, 2021, 438
[47] Bandwidth Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation Blocking
Gu, Zhixiang
Moreira, Jose
Edelsohn, David
Azad, Ariful
PROCEEDINGS OF THE 32ND ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES (SPAA '20), 2020, : 293 - 303
[48] Strassen's Matrix Multiplication on GPUs
Li, Junjie
Ranka, Sanjay
Sahni, Sartaj
2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 157 - 164
[49] Optimizing Hardware Accelerated General Matrix-Matrix Multiplication for CNNs on FPGAs
Ahmad, Afzal
Pasha, Muhammad Adeel
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2020, 67 (11) : 2692 - 2696
[50] Optimum Prefetching Patterns Searching: A Case Study of Matrix-Matrix Multiplication
Khomongkonudom, Varintom
Chaikarn, Panyayot
2022 37TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2022), 2022, : 349 - 352

← 1 2 3 4 5 →