High-Performance Matrix-Vector Multiplication on the GPU

被引:0
|
作者
Sorensen, Hans Henrik Brandenborg [1 ]
机构
[1] Tech Univ Denmark, Informat & Math Modelling, Bldg 321, DK-2800 Lyngby, Denmark
关键词
GPU; Matrix-Vector Multiplication; Dense linear algebra;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.
引用
收藏
页码:377 / 386
页数:10
相关论文
共 50 条
  • [1] Efficient dense matrix-vector multiplication on GPU
    He, Guixia
    Gao, Jiaquan
    Wang, Jun
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (19):
  • [2] A hybrid format for better performance of sparse matrix-vector multiplication on a GPU
    Guo, Dahai
    Gropp, William
    Olson, Luke N.
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2016, 30 (01): : 103 - 120
  • [3] Energy Evaluation of Sparse Matrix-Vector Multiplication on GPU
    Benatia, Akrem
    Ji, Weixing
    Wang, Yizhuo
    Shi, Feng
    2016 SEVENTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2016,
  • [4] Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU
    Zhang, Jilin
    Liu, Enyi
    Wan, Jian
    Ren, Yongjian
    Yue, Miao
    Wang, Jue
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 473 - 482
  • [5] A New Method of Sparse Matrix-Vector Multiplication on GPU
    Huan, Gao
    Qian, Zhang
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 954 - 958
  • [6] Adaptive diagonal sparse matrix-vector multiplication on GPU
    Gao, Jiaquan
    Xia, Yifei
    Yin, Renjie
    He, Guixia
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 157 : 287 - 302
  • [7] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
    Tao, Yuan
    Deng, Yangdong
    Mu, Shuai
    Zhang, Zhenzhong
    Zhu, Mingfa
    Xiao, Limin
    Ruan, Li
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789
  • [8] SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms
    Elafrou, Athena
    Karakasis, Vasileios
    Gkountouvas, Theodoros
    Kourtis, Kornilios
    Goumas, Georgios
    Koziris, Nectarios
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2018, 44 (03):
  • [9] High performance sparse matrix-vector multiplication on FPGA
    Zou, Dan
    Dou, Yong
    Guo, Song
    Ni, Shice
    IEICE ELECTRONICS EXPRESS, 2013, 10 (17):
  • [10] Performance Evaluation of Sparse Matrix-Vector Multiplication Using GPU/MIC Cluster
    Maeda, Hiroshi
    Takahashi, Daisuke
    PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, : 396 - 399