High-Performance Matrix-Vector Multiplication on the GPU

被引：0

作者：

Sorensen, Hans Henrik Brandenborg ^{[1
]}

机构：

[1] Tech Univ Denmark, Informat & Math Modelling, Bldg 321, DK-2800 Lyngby, Denmark

来源：

EURO-PAR 2011: PARALLEL PROCESSING WORKSHOPS, PT I | 2012年 / 7155卷

关键词：

GPU; Matrix-Vector Multiplication; Dense linear algebra;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.

引用

页码：377 / 386

页数：10

共 50 条

[21] Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU
Gao, Jiaquan
Qi, Panpan
He, Guixia
MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
[22] ACOUSTOOPTIC MATRIX-VECTOR MULTIPLICATION
CAULFIELD, HJ
RHODES, WT
JOURNAL OF THE OPTICAL SOCIETY OF AMERICA, 1981, 71 (12) : 1626 - 1626
[23] A high-performance matrix–matrix multiplication methodology for CPU and GPU architectures
Vasilios Kelefouras
A. Kritikakou
Iosif Mporas
Vasilios Kolonias
The Journal of Supercomputing, 2016, 72 : 804 - 844
[24] Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU
Kubota, Yuji
Takahashi, Daisuke
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2011, PT II, 2011, 6783 : 547 - 561
[25] Multi-GPU Implementation and Performance Optimization for CSR-Based Sparse Matrix-Vector Multiplication
Guo, Ping
Zhang, Changjiang
PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2419 - 2423
[26] TaiChi: A Hybrid Compression Format for Binary Sparse Matrix-Vector Multiplication on GPU
Gao, Jianhua
Ji, Weixing
Tan, Zhaonian
Wang, Yizhuo
Shi, Feng
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3732 - 3745
[27] A high-performance matrix-matrix multiplication methodology for CPU and GPU architectures
Kelefouras, Vasilios
Kritikakou, A.
Mporas, Iosif
Kolonias, Vasilios
JOURNAL OF SUPERCOMPUTING, 2016, 72 (03): : 804 - 844
[28] On the Performance and Energy Efficiency of Sparse Matrix-Vector Multiplication on FPGAs
Mpakos, Panagiotis
Papadopoulou, Nikela
Alverti, Chloe
Goumas, Georgios
Koziris, Nectarios
PARALLEL COMPUTING: TECHNOLOGY TRENDS, 2020, 36 : 624 - 633
[29] Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore
Gkountouvas, Theodoros
Karakasis, Vasileios
Kourtis, Kornilios
Goumas, Georgios
Koziris, Nectarios
IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 273 - 283
[30] Performance evaluation of the sparse matrix-vector multiplication on modern architectures
Georgios Goumas
Kornilios Kourtis
Nikos Anastopoulos
Vasileios Karakasis
Nectarios Koziris
The Journal of Supercomputing, 2009, 50 : 36 - 77

← 1 2 3 4 5 →