TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs

被引:3
|
作者
Lu, Zhengyang [1 ]
Liu, Weifeng [1 ]
机构
[1] China Univ Petr, Super Sci Software Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Sparse matrix; Sparse triangular solve; Tiled algorithm; GPU;
D O I
10.1007/s42514-023-00151-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse triangular solve (SpTRSV) is one of the most important level-2 kernels in sparse basic linear algebra subprograms (BLAS). Compared to another level-2 sparse BLAS kernel sparse matrix-vector multiplication (SpMV), SpTRSV is in general more difficult to find high parallelism on many-core processors, such as GPUs. Nowadays, much work focuses on reducing dependencies and synchronizations in the level-set and Sync-free algorithms for SpTRSV. However, there is less work that can make good use of sparse spatial structure for SpTRSV on GPUs. In this paper, we propose a tiled algorithm called TileSpTRSV for optimizing SpTRSV on GPUs through exploiting 2D spatial structure of sparse matrices. We design two algorithm implementations, i.e., TileSpTRSV_level-set and TileSpTRSV_sync-free, for TileSpTRSV on top of level-set and Sync-free algorithms, respectively. By testing 16 representative matrices on a latest NVIDIA GPU, the experimental results show that TileSpTRSV_level-set gives on average 5.29x (up to 38.10x), 5.33x (up to 21.32x) and 2.62x (up to 12.87x) speedups over cuSPARSE, Sync-free and Recblock algorithms on the 16 representative matrices, respectively.
引用
收藏
页码:129 / 143
页数:15
相关论文
共 50 条
  • [31] Parallel Source Scanning Algorithm using GPUs
    Leandro, Waldson P. N.
    Santana, Flavio L.
    Carvalho, Bruno M.
    do Nascimento, Aderson F.
    COMPUTERS & GEOSCIENCES, 2020, 140
  • [32] A parallel multithreaded sparse triangular linear system solver
    Cugu, Ilke
    Manguoglu, Murat
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2020, 80 (02) : 371 - 385
  • [33] Parallel Structured Sparse Triangular Solver for GPU Platform
    Chen D.-K.
    Yang C.
    Liu F.-F.
    Ma W.-J.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (11): : 4941 - 4951
  • [34] A parallel algorithm for functions of triangular matrices
    Koc, CK
    Bakkaloglu, B
    COMPUTING, 1996, 57 (01) : 85 - 92
  • [35] Parallel algorithm for functions of triangular matrices
    Dept. of Elec. and Comp. Engineering, Oregon State University, Corvallis, OR 97331, United States
    Comput Vienna New York, 1 (85-92):
  • [36] An efficient implementation of parallel simulated annealing algorithm in GPUs
    A. M. Ferreiro
    J. A. García
    J. G. López-Salas
    C. Vázquez
    Journal of Global Optimization, 2013, 57 : 863 - 890
  • [37] Efficient Parallel UPGMA algorithm Based on Multiple GPUs
    Hung, Che-Lun
    Wu, Fu-Che
    Lin, Chun-Yuan
    Chan, Yu-Wei
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 870 - 873
  • [38] A Highly Parallel Reuse Distance Analysis Algorithm on GPUs
    Cui, Huimin
    Yi, Qing
    Xue, Jingling
    Wang, Lei
    Yang, Yang
    Feng, Xiaobing
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 1080 - 1092
  • [39] Fast Parallel Algorithm for audio content retrieval on GPUs
    Sanabria, Adriana
    Vitola Oyaga, Jaime
    Pedraza Bonilla, Cesar
    2011 6TH COLOMBIAN COMPUTING CONGRESS (CCC), 2011,
  • [40] An efficient implementation of parallel simulated annealing algorithm in GPUs
    Ferreiro, A. M.
    Garcia, J. A.
    Lopez-Salas, J. G.
    Vazquez, C.
    JOURNAL OF GLOBAL OPTIMIZATION, 2013, 57 (03) : 863 - 890