Matrix-Matrix Multiplication Using Multiple GPUs Connected by Nvlink

被引:0
|
作者
Choi, Yea Rem [1 ]
Nikolskiy, Vsevolod [1 ]
Stegailov, Vladimir [2 ]
机构
[1] Natl Res Univ Higher Sch Econ, Moscow, Russia
[2] Russian Acad Sci, Joint Inst High Temp, Dolgoprudnyi, Russia
关键词
parallel computing; CUDA; GEMM; high-speed GPU interconnect;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this work we present an original GPU-only parallel matrix-matrix multiplication algorithm (C = alpha A * B + beta C) for servers with multiple GPUs connected by NVLink. The algorithm is implemented using CUDA. The data transfer patterns, the communication and computation overlap, and the overall performance of the algorithm are considered. By regulating the commands call order and the sizes of tiles, we tune the uninterrupted asynchronous data transmission and kernel execution. Two cases are considered: when all the data are stored in one GPU and when the matrices are distributed among several GPUs. The execution efficiency of this new algorithm is compared with cuBLAS-XT from the Nvidia CUDA Toolkit library.
引用
收藏
页码:354 / 361
页数:8
相关论文
共 50 条
  • [31] Scaling sparse matrix-matrix multiplication in the accumulo database
    Gunduz Vehbi Demirci
    Cevdet Aykanat
    Distributed and Parallel Databases, 2020, 38 : 31 - 62
  • [32] Scalability analysis of matrix-matrix multiplication on heterogeneous clusters
    Kalinov, A
    ISPDC 2004: THIRD INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING/HETEROPAR '04: THIRD INTERNATIONAL WORKSHOP ON ALGORITHMS, MODELS AND TOOLS FOR PARALLEL COMPUTING ON HETEROGENEOUS NETWORKS, PROCEEDINGS, 2004, : 303 - 309
  • [33] Using Static Allocation Algorithms for Matrix Matrix Multiplication on Multicores and GPUs
    Eyraud-Dubois, Lionel
    Lambert, Thomas
    PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2018,
  • [34] Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms
    Patwary, Md. Mostofa Ali
    Satish, Nadathur Rajagopalan
    Sundaram, Narayanan
    Park, Jongsoo
    Anderson, Michael J.
    Vadlamudi, Satya Gautam
    Das, Dipankar
    Pudov, Sergey G.
    Pirogov, Vadim O.
    Dubey, Pradeep
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2015, 2015, 9137 : 48 - 57
  • [35] Design space exploration for sparse matrix-matrix multiplication on FPGAs
    Lin, Colin Yu
    Wong, Ngai
    So, Hayden Kwok-Hay
    INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2013, 41 (02) : 205 - 219
  • [36] PARALLEL SPARSE MATRIX-MATRIX MULTIPLICATION AND INDEXING: IMPLEMENTATION AND EXPERIMENTS
    Buluc, Aydin
    Gilbert, John R.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04): : C170 - C191
  • [37] Matrix-Matrix Multiplication on a Large Register File Architecture with Indirection
    Sreedhar, Dheeraj
    Derby, J. H.
    Montoye, R. K.
    Johnson, C. L.
    2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2014,
  • [38] Accelerating sparse matrix-matrix multiplication with GPU Tensor Cores
    Zachariadis, Orestis
    Satpute, Nitin
    Gomez-Luna, Juan
    Olivares, Joaquin
    COMPUTERS & ELECTRICAL ENGINEERING, 2020, 88 (88)
  • [39] Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
    Moon, Gordon Euhyun
    Kwon, Hyoukjun
    Jeong, Geonhwa
    Chatarasi, Prasanth
    Rajamanickam, Sivasankaran
    Krishna, Tushar
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (04) : 1002 - 1014
  • [40] On-line soft error correction in matrix-matrix multiplication
    Wu, Panruo
    Ding, Chong
    Chen, Longxiang
    Davies, Teresa
    Karlsson, Christer
    Chen, Zizhong
    JOURNAL OF COMPUTATIONAL SCIENCE, 2013, 4 (06) : 465 - 472