TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs

被引:3
|
作者
Lu, Zhengyang [1 ]
Liu, Weifeng [1 ]
机构
[1] China Univ Petr, Super Sci Software Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Sparse matrix; Sparse triangular solve; Tiled algorithm; GPU;
D O I
10.1007/s42514-023-00151-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse triangular solve (SpTRSV) is one of the most important level-2 kernels in sparse basic linear algebra subprograms (BLAS). Compared to another level-2 sparse BLAS kernel sparse matrix-vector multiplication (SpMV), SpTRSV is in general more difficult to find high parallelism on many-core processors, such as GPUs. Nowadays, much work focuses on reducing dependencies and synchronizations in the level-set and Sync-free algorithms for SpTRSV. However, there is less work that can make good use of sparse spatial structure for SpTRSV on GPUs. In this paper, we propose a tiled algorithm called TileSpTRSV for optimizing SpTRSV on GPUs through exploiting 2D spatial structure of sparse matrices. We design two algorithm implementations, i.e., TileSpTRSV_level-set and TileSpTRSV_sync-free, for TileSpTRSV on top of level-set and Sync-free algorithms, respectively. By testing 16 representative matrices on a latest NVIDIA GPU, the experimental results show that TileSpTRSV_level-set gives on average 5.29x (up to 38.10x), 5.33x (up to 21.32x) and 2.62x (up to 12.87x) speedups over cuSPARSE, Sync-free and Recblock algorithms on the 16 representative matrices, respectively.
引用
收藏
页码:129 / 143
页数:15
相关论文
共 50 条
  • [21] On parallel solvers for sparse triangular systems
    González, P
    Cabaleiro, JC
    Pena, TF
    JOURNAL OF SYSTEMS ARCHITECTURE, 2000, 46 (08) : 675 - 685
  • [22] A Parallel Sparse Tensor Benchmark Suite on CPUs and GPUs
    Li, Jiajia
    Lakshminarasimhan, Mahesh
    Wu, Xiaolong
    Li, Ang
    Olschanowsky, Catherine
    Barker, Kevin
    PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 403 - 404
  • [23] A Fast Parallel Selection Algorithm on GPUs
    Bakunas-Milanowski, Darius
    Rego, Vernon
    Sang, Janche
    Yu, Chansu
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2015, : 609 - 614
  • [24] Machine learning for optimal selection of sparse triangular system solvers on GPUs
    Dufrechou, Ernesto
    Ezzatti, Pablo
    Freire, Manuel
    Quintana-Orti, Enrique S.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 158 : 47 - 55
  • [25] A high performance two dimensional scalable parallel algorithm for solving sparse triangular systems
    Joshi, MV
    Gupta, A
    Karypis, G
    Kumar, V
    FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 137 - 143
  • [26] THE PARALLEL TILED WZ FACTORIZATION ALGORITHM FOR MULTICORE ARCHITECTURES
    Bylina, Beata
    Bylina, Jaroslaw
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2019, 29 (02) : 407 - 419
  • [27] PARALLEL ALGORITHMS FOR SPARSE TRIANGULAR SYSTEM SOLUTION
    KUMAR, PS
    KUMAR, MK
    BASU, A
    PARALLEL COMPUTING, 1993, 19 (02) : 187 - 196
  • [28] OPTIMAL PARALLEL SOLUTION OF SPARSE TRIANGULAR SYSTEMS
    ALVARADO, FL
    SCHREIBER, R
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1993, 14 (02): : 446 - 460
  • [29] swSpTRSV: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures
    Wang, Xinliang
    Liu, Weifeng
    Xue, Wei
    Wu, Li
    ACM SIGPLAN NOTICES, 2018, 53 (01) : 338 - 353
  • [30] Parallel Implementation of Sparse Representation Classifiers for Hyperspectral Imagery on GPUs
    Wu, Zebin
    Wang, Qicong
    Plaza, Antonio
    Li, Jun
    Liu, Jianjun
    Wei, Zhihui
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2015, 8 (06) : 2912 - 2925