High-Performance Matrix-Vector Multiplication on the GPU

被引:0
|
作者
Sorensen, Hans Henrik Brandenborg [1 ]
机构
[1] Tech Univ Denmark, Informat & Math Modelling, Bldg 321, DK-2800 Lyngby, Denmark
关键词
GPU; Matrix-Vector Multiplication; Dense linear algebra;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.
引用
收藏
页码:377 / 386
页数:10
相关论文
共 50 条
  • [21] Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU
    Gao, Jiaquan
    Qi, Panpan
    He, Guixia
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
  • [22] ACOUSTOOPTIC MATRIX-VECTOR MULTIPLICATION
    CAULFIELD, HJ
    RHODES, WT
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA, 1981, 71 (12) : 1626 - 1626
  • [23] A high-performance matrix–matrix multiplication methodology for CPU and GPU architectures
    Vasilios Kelefouras
    A. Kritikakou
    Iosif Mporas
    Vasilios Kolonias
    The Journal of Supercomputing, 2016, 72 : 804 - 844
  • [24] Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU
    Kubota, Yuji
    Takahashi, Daisuke
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2011, PT II, 2011, 6783 : 547 - 561
  • [25] Multi-GPU Implementation and Performance Optimization for CSR-Based Sparse Matrix-Vector Multiplication
    Guo, Ping
    Zhang, Changjiang
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2419 - 2423
  • [26] TaiChi: A Hybrid Compression Format for Binary Sparse Matrix-Vector Multiplication on GPU
    Gao, Jianhua
    Ji, Weixing
    Tan, Zhaonian
    Wang, Yizhuo
    Shi, Feng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3732 - 3745
  • [27] A high-performance matrix-matrix multiplication methodology for CPU and GPU architectures
    Kelefouras, Vasilios
    Kritikakou, A.
    Mporas, Iosif
    Kolonias, Vasilios
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (03): : 804 - 844
  • [28] On the Performance and Energy Efficiency of Sparse Matrix-Vector Multiplication on FPGAs
    Mpakos, Panagiotis
    Papadopoulou, Nikela
    Alverti, Chloe
    Goumas, Georgios
    Koziris, Nectarios
    PARALLEL COMPUTING: TECHNOLOGY TRENDS, 2020, 36 : 624 - 633
  • [29] Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore
    Gkountouvas, Theodoros
    Karakasis, Vasileios
    Kourtis, Kornilios
    Goumas, Georgios
    Koziris, Nectarios
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 273 - 283
  • [30] Performance evaluation of the sparse matrix-vector multiplication on modern architectures
    Georgios Goumas
    Kornilios Kourtis
    Nikos Anastopoulos
    Vasileios Karakasis
    Nectarios Koziris
    The Journal of Supercomputing, 2009, 50 : 36 - 77