Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

被引：3

作者：

Tanabe, Noboru ^{[1
]}

Ogawa, Yuuka ^{[2
]}

Takata, Masami ^{[2
]}

Joe, Kazuki ^{[2
]}

机构：

[1] Toshiba Co Ltd, Corp Res & Dev Ctr, Kawasaki, Kanagawa 2128582, Japan

[2] Nara Womens Univ, Dept Adv Informat & Comp Sci, Nara, Japan

来源：

PROCEEDINGS OF THE 19TH INTERNATIONAL EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING | 2011年

关键词：

GPGPU; Scatter/Gather; Functional Memory; Matrix-Vector Multiplication;

D O I：

10.1109/PDP.2011.92

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse matrix-vector multiplication on GPUs faces to a serious problem when the vector length is too large to be stored in GPU's device memory. To solve this problem, we propose a novel software-hardware hybrid method for a heterogeneous system with GPUs and functional memory modules connected by PCI express. The functional memory contains huge capacity of memory and provides scatter/gather operations. We perform some preliminary evaluation for the proposed method with using a sparse matrix benchmark collection. We observe that the proposed method for a GPU with converting indirect references to direct references without exhausting GPU's cache memory achieves 4.1 times speedup compared with conventional methods. The proposed method intrinsically has high scalability of the number of GPUs because intercommunication among GPUs is completely eliminated. Therefore we estimate the performance of our proposed method would be expressed as the single GPU execution performance, which may be suppressed by the burst-transfer bandwidth of PCI express, multiplied with the number of GPUs.

引用

页码：101 / 108

页数：8

共 50 条

[41] Performance Aspects of Sparse Matrix-Vector Multiplication
Simecek, I.
[J]. ACTA POLYTECHNICA, 2006, 46 (03) : 3 - 8
[42] Sparse matrix-vector multiplication -: Final solution?
Simecek, Ivan
Tvrdik, Pavel
[J]. PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2008, 4967 : 156 - 165
[43] On improving the performance of sparse matrix-vector multiplication
White, JB
Sadayappan, P
[J]. FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 66 - 71
[44] Sparse Matrix-Vector Product for the bmSparse Matrix Format in GPUs
Berger, Gonzalo
Dufrechou, Ernesto
Ezzatti, Pablo
[J]. EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT I, EURO-PAR 2023, 2024, 14351 : 246 - 256
[45] Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs
Zhao, Zhixiang
Zhang, Guoyin
Wu, Yanxia
Hong, Ruize
Yang, Yiqing
Fu, Yan
[J]. JOURNAL OF SUPERCOMPUTING, 2024, 80 (10): : 13681 - 13713
[46] Accelerating Sparse Matrix-Vector Multiplication on GPUs using Bit-Representation-Optimized Schemes
Tang, Wai Teng
Tan, Wen Jun
Ray, Rajarshi
Wong, Yi Wen
Chen, Weiguang
Kuo, Shyh-hao
Goh, Rick Siow Mong
Turner, Stephen John
Wong, Weng-Fai
[J]. 2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
[47] A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs
Ashari, Arash
Sedaghati, Naser
Eisenlohr, John
Sadayappan, P.
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 76 : 3 - 15
[48] Adaptive sparse matrix representation for efficient matrix-vector multiplication
Zardoshti, Pantea
Khunjush, Farshad
Sarbazi-Azad, Hamid
[J]. JOURNAL OF SUPERCOMPUTING, 2016, 72 (09): : 3366 - 3386
[49] Communication balancing in parallel sparse matrix-vector multiplication
Bisseling, RH
Meesen, W
[J]. ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2005, 21 : 47 - 65
[50] Sparse matrix-vector multiplication on network-on-chip
Sun, C-C
Goetze, J.
Jheng, H-Y
Ruan, S-J
[J]. ADVANCES IN RADIO SCIENCE, 2010, 8 : 289 - 294

← 1 2 3 4 5 →