TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs

被引：3

作者：

Lu, Zhengyang ^{[1
]}

Liu, Weifeng ^{[1
]}

机构：

[1] China Univ Petr, Super Sci Software Lab, Beijing, Peoples R China

来源：

CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING | 2023年 / 5卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Sparse matrix; Sparse triangular solve; Tiled algorithm; GPU;

D O I：

10.1007/s42514-023-00151-1

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse triangular solve (SpTRSV) is one of the most important level-2 kernels in sparse basic linear algebra subprograms (BLAS). Compared to another level-2 sparse BLAS kernel sparse matrix-vector multiplication (SpMV), SpTRSV is in general more difficult to find high parallelism on many-core processors, such as GPUs. Nowadays, much work focuses on reducing dependencies and synchronizations in the level-set and Sync-free algorithms for SpTRSV. However, there is less work that can make good use of sparse spatial structure for SpTRSV on GPUs. In this paper, we propose a tiled algorithm called TileSpTRSV for optimizing SpTRSV on GPUs through exploiting 2D spatial structure of sparse matrices. We design two algorithm implementations, i.e., TileSpTRSV_level-set and TileSpTRSV_sync-free, for TileSpTRSV on top of level-set and Sync-free algorithms, respectively. By testing 16 representative matrices on a latest NVIDIA GPU, the experimental results show that TileSpTRSV_level-set gives on average 5.29x (up to 38.10x), 5.33x (up to 21.32x) and 2.62x (up to 12.87x) speedups over cuSPARSE, Sync-free and Recblock algorithms on the 16 representative matrices, respectively.

引用

页码：129 / 143

页数：15

共 50 条

[41] An Efficient Parallel ISODATA Algorithm Based on Kepler GPUs
Yang, Shiquan
Dong, Jianqiang
Yuan, Bo
PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 2444 - 2449
[42] A parallel generator for sparse unstructured meshes to solve the eikonal equation
Zoennchen, Benedikt
Koester, Gerta
JOURNAL OF COMPUTATIONAL SCIENCE, 2019, 32 : 141 - 147
[43] A PARALLEL ALGORITHM TO SOLVE THE STABLE MARRIAGE PROBLEM
TSENG, SS
LEE, RCT
BIT, 1984, 24 (03): : 308 - 316
[44] PARALLEL ICCG ON A HIERARCHICAL MEMORY MULTIPROCESSOR - ADDRESSING THE TRIANGULAR SOLVE BOTTLENECK
ROTHBERG, E
GUPTA, A
PARALLEL COMPUTING, 1992, 18 (07) : 719 - 741
[45] YEfficient Parallel Implementations of Sparse Triangular Solves for GPU Architectures
Li, Ruipeng
Zhang, Chaoyu
PROCEEDINGS OF THE 2020 SIAM CONFERENCE ON PARALLEL PROCESSING FOR SCIENTIFIC COMPUTING, PP, 2020, : 106 - 117
[46] An Architecture of Parallel Tiled QRD Algorithm for MIMO-OFDM Systems
Liu, Cang
Tang, Chuan
Xing, Zuocheng
Chen, Lirui
Zhang, Yang
Fu, Guitao
2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 2092 - 2096
[47] Parallel algorithms for solving linear systems with sparse triangular matrices
Jan Mayer
Computing, 2009, 86 : 291 - 312
[48] Parallel algorithms for solving linear systems with sparse triangular matrices
Mayer, Jan
COMPUTING, 2009, 86 (04) : 291 - 312
[49] A block-oriented, parallel and collective approach to sparse indefinite preconditioning on GPUs
Thuerck, Daniel
Naumov, Maxim
Garland, Michael
Goesele, Michael
PROCEEDINGS OF IA3 2018: 2018 IEEE/ACM 8TH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURES AND ALGORITHMS, 2018, : 1 - 10
[50] Parallel algorithm of sparse matrix multiplying
Cai, Zixing
Zheng, Jinhua
Zhu, Zhenmin
Xiangtan Daxue Ziran Kexue Xuebao, 2000, 22 (01):

← 1 2 3 4 5 →