TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs

被引:3
|
作者
Lu, Zhengyang [1 ]
Liu, Weifeng [1 ]
机构
[1] China Univ Petr, Super Sci Software Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Sparse matrix; Sparse triangular solve; Tiled algorithm; GPU;
D O I
10.1007/s42514-023-00151-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse triangular solve (SpTRSV) is one of the most important level-2 kernels in sparse basic linear algebra subprograms (BLAS). Compared to another level-2 sparse BLAS kernel sparse matrix-vector multiplication (SpMV), SpTRSV is in general more difficult to find high parallelism on many-core processors, such as GPUs. Nowadays, much work focuses on reducing dependencies and synchronizations in the level-set and Sync-free algorithms for SpTRSV. However, there is less work that can make good use of sparse spatial structure for SpTRSV on GPUs. In this paper, we propose a tiled algorithm called TileSpTRSV for optimizing SpTRSV on GPUs through exploiting 2D spatial structure of sparse matrices. We design two algorithm implementations, i.e., TileSpTRSV_level-set and TileSpTRSV_sync-free, for TileSpTRSV on top of level-set and Sync-free algorithms, respectively. By testing 16 representative matrices on a latest NVIDIA GPU, the experimental results show that TileSpTRSV_level-set gives on average 5.29x (up to 38.10x), 5.33x (up to 21.32x) and 2.62x (up to 12.87x) speedups over cuSPARSE, Sync-free and Recblock algorithms on the 16 representative matrices, respectively.
引用
收藏
页码:129 / 143
页数:15
相关论文
共 50 条
  • [41] An Efficient Parallel ISODATA Algorithm Based on Kepler GPUs
    Yang, Shiquan
    Dong, Jianqiang
    Yuan, Bo
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 2444 - 2449
  • [42] A parallel generator for sparse unstructured meshes to solve the eikonal equation
    Zoennchen, Benedikt
    Koester, Gerta
    JOURNAL OF COMPUTATIONAL SCIENCE, 2019, 32 : 141 - 147
  • [43] A PARALLEL ALGORITHM TO SOLVE THE STABLE MARRIAGE PROBLEM
    TSENG, SS
    LEE, RCT
    BIT, 1984, 24 (03): : 308 - 316
  • [44] PARALLEL ICCG ON A HIERARCHICAL MEMORY MULTIPROCESSOR - ADDRESSING THE TRIANGULAR SOLVE BOTTLENECK
    ROTHBERG, E
    GUPTA, A
    PARALLEL COMPUTING, 1992, 18 (07) : 719 - 741
  • [45] YEfficient Parallel Implementations of Sparse Triangular Solves for GPU Architectures
    Li, Ruipeng
    Zhang, Chaoyu
    PROCEEDINGS OF THE 2020 SIAM CONFERENCE ON PARALLEL PROCESSING FOR SCIENTIFIC COMPUTING, PP, 2020, : 106 - 117
  • [46] An Architecture of Parallel Tiled QRD Algorithm for MIMO-OFDM Systems
    Liu, Cang
    Tang, Chuan
    Xing, Zuocheng
    Chen, Lirui
    Zhang, Yang
    Fu, Guitao
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 2092 - 2096
  • [47] Parallel algorithms for solving linear systems with sparse triangular matrices
    Jan Mayer
    Computing, 2009, 86 : 291 - 312
  • [48] Parallel algorithms for solving linear systems with sparse triangular matrices
    Mayer, Jan
    COMPUTING, 2009, 86 (04) : 291 - 312
  • [49] A block-oriented, parallel and collective approach to sparse indefinite preconditioning on GPUs
    Thuerck, Daniel
    Naumov, Maxim
    Garland, Michael
    Goesele, Michael
    PROCEEDINGS OF IA3 2018: 2018 IEEE/ACM 8TH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURES AND ALGORITHMS, 2018, : 1 - 10
  • [50] Parallel algorithm of sparse matrix multiplying
    Cai, Zixing
    Zheng, Jinhua
    Zhu, Zhenmin
    Xiangtan Daxue Ziran Kexue Xuebao, 2000, 22 (01):