Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

被引:3
|
作者
Tanabe, Noboru [1 ]
Ogawa, Yuuka [2 ]
Takata, Masami [2 ]
Joe, Kazuki [2 ]
机构
[1] Toshiba Co Ltd, Corp Res & Dev Ctr, Kawasaki, Kanagawa 2128582, Japan
[2] Nara Womens Univ, Dept Adv Informat & Comp Sci, Nara, Japan
关键词
GPGPU; Scatter/Gather; Functional Memory; Matrix-Vector Multiplication;
D O I
10.1109/PDP.2011.92
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse matrix-vector multiplication on GPUs faces to a serious problem when the vector length is too large to be stored in GPU's device memory. To solve this problem, we propose a novel software-hardware hybrid method for a heterogeneous system with GPUs and functional memory modules connected by PCI express. The functional memory contains huge capacity of memory and provides scatter/gather operations. We perform some preliminary evaluation for the proposed method with using a sparse matrix benchmark collection. We observe that the proposed method for a GPU with converting indirect references to direct references without exhausting GPU's cache memory achieves 4.1 times speedup compared with conventional methods. The proposed method intrinsically has high scalability of the number of GPUs because intercommunication among GPUs is completely eliminated. Therefore we estimate the performance of our proposed method would be expressed as the single GPU execution performance, which may be suppressed by the burst-transfer bandwidth of PCI express, multiplied with the number of GPUs.
引用
收藏
页码:101 / 108
页数:8
相关论文
共 50 条
  • [41] Performance Aspects of Sparse Matrix-Vector Multiplication
    Simecek, I.
    [J]. ACTA POLYTECHNICA, 2006, 46 (03) : 3 - 8
  • [42] Sparse matrix-vector multiplication -: Final solution?
    Simecek, Ivan
    Tvrdik, Pavel
    [J]. PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2008, 4967 : 156 - 165
  • [43] On improving the performance of sparse matrix-vector multiplication
    White, JB
    Sadayappan, P
    [J]. FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 66 - 71
  • [44] Sparse Matrix-Vector Product for the bmSparse Matrix Format in GPUs
    Berger, Gonzalo
    Dufrechou, Ernesto
    Ezzatti, Pablo
    [J]. EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT I, EURO-PAR 2023, 2024, 14351 : 246 - 256
  • [45] Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs
    Zhao, Zhixiang
    Zhang, Guoyin
    Wu, Yanxia
    Hong, Ruize
    Yang, Yiqing
    Fu, Yan
    [J]. JOURNAL OF SUPERCOMPUTING, 2024, 80 (10): : 13681 - 13713
  • [46] Accelerating Sparse Matrix-Vector Multiplication on GPUs using Bit-Representation-Optimized Schemes
    Tang, Wai Teng
    Tan, Wen Jun
    Ray, Rajarshi
    Wong, Yi Wen
    Chen, Weiguang
    Kuo, Shyh-hao
    Goh, Rick Siow Mong
    Turner, Stephen John
    Wong, Weng-Fai
    [J]. 2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
  • [47] A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs
    Ashari, Arash
    Sedaghati, Naser
    Eisenlohr, John
    Sadayappan, P.
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 76 : 3 - 15
  • [48] Adaptive sparse matrix representation for efficient matrix-vector multiplication
    Zardoshti, Pantea
    Khunjush, Farshad
    Sarbazi-Azad, Hamid
    [J]. JOURNAL OF SUPERCOMPUTING, 2016, 72 (09): : 3366 - 3386
  • [49] Communication balancing in parallel sparse matrix-vector multiplication
    Bisseling, RH
    Meesen, W
    [J]. ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2005, 21 : 47 - 65
  • [50] Sparse matrix-vector multiplication on network-on-chip
    Sun, C-C
    Goetze, J.
    Jheng, H-Y
    Ruan, S-J
    [J]. ADVANCES IN RADIO SCIENCE, 2010, 8 : 289 - 294