Matrix-Matrix Multiplication Using Multiple GPUs Connected by Nvlink

被引:0
|
作者
Choi, Yea Rem [1 ]
Nikolskiy, Vsevolod [1 ]
Stegailov, Vladimir [2 ]
机构
[1] Natl Res Univ Higher Sch Econ, Moscow, Russia
[2] Russian Acad Sci, Joint Inst High Temp, Dolgoprudnyi, Russia
关键词
parallel computing; CUDA; GEMM; high-speed GPU interconnect;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this work we present an original GPU-only parallel matrix-matrix multiplication algorithm (C = alpha A * B + beta C) for servers with multiple GPUs connected by NVLink. The algorithm is implemented using CUDA. The data transfer patterns, the communication and computation overlap, and the overall performance of the algorithm are considered. By regulating the commands call order and the sizes of tiles, we tune the uninterrupted asynchronous data transmission and kernel execution. Two cases are considered: when all the data are stored in one GPU and when the matrices are distributed among several GPUs. The execution efficiency of this new algorithm is compared with cuBLAS-XT from the Nvidia CUDA Toolkit library.
引用
下载
收藏
页码:354 / 361
页数:8
相关论文
共 50 条
  • [21] GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks
    Huang, Guyue
    Dai, Guohao
    Wang, Yu
    Yang, Huazhong
    PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
  • [22] TSM2X: High-performance tall-and-skinny matrix-matrix multiplication on GPUs
    Rivera, Cody
    Chen, Jieyang
    Xiong, Nan
    Zhang, Jing
    Song, Shuaiwen Leon
    Tao, Dingwen
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 151 : 70 - 85
  • [23] A data locality methodology for matrix-matrix multiplication algorithm
    Alachiotis, Nicolaos
    Kelefouras, Vasileios I.
    Athanasiou, George S.
    Michail, Harris E.
    Kritikakou, Angeliki S.
    Goutis, Costas E.
    JOURNAL OF SUPERCOMPUTING, 2012, 59 (02): : 830 - 851
  • [24] Fountain Codes for Private Distributed Matrix-Matrix Multiplication
    Bitar, Rawad
    Xhemrishi, Marvin
    Wachter-Zeh, Antonia
    PROCEEDINGS OF 2020 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA2020), 2020, : 480 - 484
  • [25] Optimizing sparse general matrix-matrix multiplication for DCUs
    Guo, Hengliang
    Wang, Haolei
    Chen, Wanting
    Zhang, Congxiang
    Han, Yubo
    Zhu, Shengguang
    Zhang, Dujuan
    Guo, Yang
    Shang, Jiandong
    Wan, Tao
    Li, Qingyang
    Wu, Gang
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (14): : 20176 - 20200
  • [26] A Systematic Survey of General Sparse Matrix-matrix Multiplication
    Gao, Jianhua
    Ji, Weixing
    Chang, Fangli
    Han, Shiyu
    Wei, Bingxin
    Liu, Zeming
    Wang, Yizhuo
    ACM COMPUTING SURVEYS, 2023, 55 (12)
  • [27] Automating Structured Matrix-Matrix Multiplication for Stream Processing
    Koehn, Thaddeus
    Athanas, Peter
    2016 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG16), 2016,
  • [28] Hierarchical matrix-matrix multiplication based on multiprocessor tasks
    Hunold, S
    Rauber, T
    Rünger, G
    COMPUTATIONAL SCIENCE - ICCS 2004, PT 2, PROCEEDINGS, 2004, 3037 : 1 - 8
  • [29] Parallel photonic acceleration processor for matrix-matrix multiplication
    Huang, Ying
    Yue, Hengsong
    Ma, Wei
    Zhang, Yiyuan
    Xiao, Yao
    Tang, Yong
    Tang, He
    Chu, Tao
    OPTICS LETTERS, 2023, 48 (12) : 3231 - 3234
  • [30] Scaling sparse matrix-matrix multiplication in the accumulo database
    Demirci, Gunduz Vehbi
    Aykanat, Cevdet
    DISTRIBUTED AND PARALLEL DATABASES, 2020, 38 (01) : 31 - 62