TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs

被引:3
|
作者
Lu, Zhengyang [1 ]
Liu, Weifeng [1 ]
机构
[1] China Univ Petr, Super Sci Software Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Sparse matrix; Sparse triangular solve; Tiled algorithm; GPU;
D O I
10.1007/s42514-023-00151-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse triangular solve (SpTRSV) is one of the most important level-2 kernels in sparse basic linear algebra subprograms (BLAS). Compared to another level-2 sparse BLAS kernel sparse matrix-vector multiplication (SpMV), SpTRSV is in general more difficult to find high parallelism on many-core processors, such as GPUs. Nowadays, much work focuses on reducing dependencies and synchronizations in the level-set and Sync-free algorithms for SpTRSV. However, there is less work that can make good use of sparse spatial structure for SpTRSV on GPUs. In this paper, we propose a tiled algorithm called TileSpTRSV for optimizing SpTRSV on GPUs through exploiting 2D spatial structure of sparse matrices. We design two algorithm implementations, i.e., TileSpTRSV_level-set and TileSpTRSV_sync-free, for TileSpTRSV on top of level-set and Sync-free algorithms, respectively. By testing 16 representative matrices on a latest NVIDIA GPU, the experimental results show that TileSpTRSV_level-set gives on average 5.29x (up to 38.10x), 5.33x (up to 21.32x) and 2.62x (up to 12.87x) speedups over cuSPARSE, Sync-free and Recblock algorithms on the 16 representative matrices, respectively.
引用
收藏
页码:129 / 143
页数:15
相关论文
共 50 条
  • [1] TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs
    Zhengyang Lu
    Weifeng Liu
    CCF Transactions on High Performance Computing, 2023, 5 : 129 - 143
  • [2] TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs
    Niu, Yuyao
    Lu, Zhengyang
    Ji, Haonan
    Song, Shuhui
    Jin, Zhou
    Liu, Weifeng
    PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 90 - 106
  • [3] TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs
    Ji, Haonan
    Song, Huimin
    Lu, Shibo
    Jin, Zhou
    Tan, Guangming
    Liu, Weifeng
    51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
  • [4] A parallel sparse triangular solve algorithm based on dependency elimination of the solution vector
    Jin, Song
    Pei, Songwei
    Wang, Yu
    Qi, Yincheng
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2021, 24 (02): : 1317 - 1330
  • [5] A parallel sparse triangular solve algorithm based on dependency elimination of the solution vector
    Song Jin
    Songwei Pei
    Yu Wang
    Yincheng Qi
    Cluster Computing, 2021, 24 : 1317 - 1330
  • [6] TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs
    Niu, Yuyao
    Lu, Zhengyang
    Dong, Meichen
    Jin, Zhou
    Liu, Weifeng
    Tan, Guangming
    2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 68 - 78
  • [7] A Hybrid Synchronization Mechanism for Parallel Sparse Triangular Solve
    Sandhu, Prabhjot
    Verbrugge, Clark
    Hendren, Laurie
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING (LCPC 2021), 2022, 13181 : 118 - 133
  • [8] Efficient Block Algorithms for Parallel Sparse Triangular Solve
    Lu, Zhengyang
    Niu, Yuyao
    Liu, Weifeng
    PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [9] AG-SpTRSV: An Automatic Framework to Optimize Sparse Triangular Solve on GPUs
    Hu, Zhengding
    Sun, Jingwei
    Li, Zhongyang
    Sun, Guangzhong
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (04)
  • [10] CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs
    Su, Jiya
    Zhang, Feng
    Liu, Weifeng
    He, Bingsheng
    Wu, Ruofan
    Du, Xiaoyong
    Wang, Rujia
    PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,