High-Performance Matrix-Vector Multiplication on the GPU

被引:0
|
作者
Sorensen, Hans Henrik Brandenborg [1 ]
机构
[1] Tech Univ Denmark, Informat & Math Modelling, Bldg 321, DK-2800 Lyngby, Denmark
关键词
GPU; Matrix-Vector Multiplication; Dense linear algebra;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.
引用
收藏
页码:377 / 386
页数:10
相关论文
共 50 条
  • [41] A Novel Multi-GPU Parallel Optimization Model for The Sparse Matrix-Vector Multiplication
    Gao, Jiaquan
    Zhou, Yuanshen
    Wu, Kesong
    PARALLEL PROCESSING LETTERS, 2016, 26 (04)
  • [42] Vector ISA extension for sparse matrix-vector multiplication
    Vassiliadis, S
    Cotofana, S
    Stathis, P
    EURO-PAR'99: PARALLEL PROCESSING, 1999, 1685 : 708 - 715
  • [43] Performance Evaluation of Multithreaded Sparse Matrix-Vector Multiplication using OpenMP
    Liu, Shengfei
    Zhang, Yunquan
    Sun, Xiangzheng
    Qiu, RongRong
    HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 659 - +
  • [44] Sparse Matrix-Vector Multiplication Cache Performance Evaluation and Design Exploration
    Cui, Jianfeng
    Lu, Kai
    Liu, Sheng
    29TH INTERNATIONAL SYMPOSIUM ON THE MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2021), 2021, : 97 - 103
  • [45] Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
    Zhang, Kai
    Chen, Shuming
    Wang, Yaohua
    Wan, Jianghua
    IEICE ELECTRONICS EXPRESS, 2013, 10 (09):
  • [46] CUDA GPU libraries and novel sparse matrix-vector multiplication - Implementation and performance enhancement in unstructured finite element computations
    Haney R.
    Mohan R.
    International Journal of Computational Science and Engineering, 2019, 20 (04): : 501 - 507
  • [47] Evaluating the Performance Impact of Communication Imbalance in Sparse Matrix-Vector Multiplication
    Utrera, Gladys
    Gil, Marisa
    Martorell, Xavier
    23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 321 - 328
  • [48] Adaptive Wavelet Methods - Matrix-Vector Multiplication
    Cerna, Dana
    Finek, Vaclav
    INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2009 (ICCMSE 2009), 2012, 1504 : 832 - 836
  • [49] FAST MULTIRESOLUTION ALGORITHMS FOR MATRIX-VECTOR MULTIPLICATION
    HARTEN, A
    YADSHALOM, I
    SIAM JOURNAL ON NUMERICAL ANALYSIS, 1994, 31 (04) : 1191 - 1218
  • [50] Matrix-Vector Multiplication in Adaptive Wavelet Methods
    Cerna, Dana
    Finek, Vaclav
    APPLICATIONS OF MATHEMATICS IN ENGINEERING AND ECONOMICS (AMEE'11): PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE, 2011, 1410