Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format

被引:8
|
作者
Shi, Shaohuai [1 ]
Wang, Qiang [2 ]
Chu, Xiaowen [2 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Peoples R China
关键词
Sparse Matrix Multiplication; COO; GCOO; GPU;
D O I
10.1109/ICPADS51040.2020.00013
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Multiplication of a sparse matrix to a dense matrix (SpDM) is widely used in many areas like scientific computing and machine learning. However, existing work under-looks the performance optimization of SpDM on modern manycore architectures like GPUs. The storage data structures help sparse matrices store in a memory-saving format, but they bring difficulties in optimizing the performance of SpDM on modern GPUs due to irregular data access of the sparse structure, which results in lower resource utilization and poorer performance. In this paper, we refer to the roofline performance model of GPUs to design an efficient SpDM algorithm called GCOOSpDM, in which we exploit coalescent global memory access, fast shared memory reuse, and more operations per byte of global memory traffic. Experiments are evaluated on three Nvidia GPUs (i.e., GTX 980, GTX Titan X Pascal, and Tesla P100) using a large number of matrices including a public dataset and randomly generated matrices. Experimental results show that GCOOSpDM achieves 1.5-8 x speedup over Nvidia's library cuSPARSE in many matrices.
引用
收藏
页码:19 / 26
页数:8
相关论文
共 50 条
  • [1] Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication
    Koanantakool, Penporn
    Azad, Ariful
    Buluc, Aydin
    Morozov, Dmitriy
    Oh, Sang-Yun
    Oliker, Leonid
    Yelick, Katherine
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 842 - 853
  • [2] An Efficient Sparse-Dense Matrix Multiplication on a Multicore System
    Yan, Di
    Wu, Tao
    Liu, Ying
    Gao, Yang
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1880 - 1883
  • [3] Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format
    Greathouse, Joseph L.
    Daga, Mayank
    [J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 769 - 780
  • [4] A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors
    Liu, Weifeng
    Vinter, Brian
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 85 : 47 - 61
  • [5] Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs
    Wei, Bingxin
    Wang, Yizhuo
    Chang, Fangli
    Gao, Jianhua
    Ji, Weixing
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2024, 38 (03): : 245 - 259
  • [6] Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms
    Patwary, Md. Mostofa Ali
    Satish, Nadathur Rajagopalan
    Sundaram, Narayanan
    Park, Jongsoo
    Anderson, Michael J.
    Vadlamudi, Satya Gautam
    Das, Dipankar
    Pudov, Sergey G.
    Pirogov, Vadim O.
    Dubey, Pradeep
    [J]. HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2015, 2015, 9137 : 48 - 57
  • [7] SDMA: An Efficient and Flexible Sparse-Dense Matrix-Multiplication Architecture for GNNs
    Gao, Yingxue
    Gong, Lei
    Wang, Chao
    Wang, Teng
    Zhou, Xuehai
    [J]. 2022 32ND INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2022, : 307 - 312
  • [8] Register-based Implementation of the Sparse General Matrix-Matrix Multiplication on GPUs
    Liu, Junhong
    He, Xin
    Liu, Weifeng
    Tan, Guangming
    [J]. ACM SIGPLAN NOTICES, 2018, 53 (01) : 407 - 408
  • [9] TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs
    Niu, Yuyao
    Lu, Zhengyang
    Ji, Haonan
    Song, Shuhui
    Jin, Zhou
    Liu, Weifeng
    [J]. PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 90 - 106
  • [10] Hypergraph partitioning for sparse matrix-matrix multiplication
    Ballard G.
    Druinsky A.
    Knight N.
    Schwartz O.
    [J]. ACM Transactions on Parallel Computing, 2016, 3 (03) : 1 - 34