AG-SpTRSV: An Automatic Framework to Optimize Sparse Triangular Solve on GPUs

被引：0

作者：

Hu, Zhengding ^{[1
]}

Sun, Jingwei ^{[1
]}

Li, Zhongyang ^{[1
]}

Sun, Guangzhong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Comp Sci & Technol, Hefei, Peoples R China

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2024年 / 21卷 / 04期

关键词：

Sparse matrix; triangular solve; automatic optimization; GPU; SYSTEM SOLVERS; SELECTION;

D O I：

10.1145/3674911

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse Triangular Solve (SpTRSV) has long been an essential kernel in the field of scientific computing. Due to its low computational intensity and internal data dependencies, SpTRSV is hard to implement and optimize on graphics processing units (GPUs). Based on our experimental observations, existing implementations on GPUs fail to achieve the optimal performance due to their suboptimal parallelism setups and code implementations plus lack of consideration of the irregular data distribution. Moreover, their algorithm design lacks the adaptability to different input matrices, which may involve substantial manual efforts of algorithm redesigning and parameter tuning for performance consistency. In this work, we propose AG-SpTRSV, an automatic framework to optimize SpTRSV on GPUs, which provides high performance on various matrices while eliminating the costs of manual design. AG-SpTRSV abstracts the procedures of optimizing an SpTRSV kernel as a scheme and constructs a comprehensive optimization space based on it. By defining a unified code template and preparing code variants, AG-SpTRSV enables fine-grained dynamic parallelism and adaptive code optimizations to handle various tasks. Through computation graph transformation and multi-hierarchy heuristic scheduling, AG-SpTRSV generates schemes for task partitioning and mapping, which effectively address the issues of irregular data distribution and internal data dependencies. AG-SpTRSV searches for the best scheme to optimize the target kernel for the specific matrix. A learned lightweight performance model is also introduced to reduce search costs and provide an efficient end-to-end solution. Experimental results with SuiteSparse Matrix Collection on NVIDIA Tesla A100 and RTX 3080 Ti show that AG-SpTRSV outperforms state-of-the-art implementations with geometric average speedups of 2.12x similar to 3.99x. With the performance model enabled, AG-SpTRSV can provide an efficient end-to-end solution, with preprocessing times ranging from 3.4 to 245 times of the execution time.

引用

页数：25

共 4 条

[1] TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs
Lu, Zhengyang
Liu, Weifeng
CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2023, 5 (02) : 129 - 143
[2] TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs
Zhengyang Lu
Weifeng Liu
CCF Transactions on High Performance Computing, 2023, 5 : 129 - 143
[3] CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs
Su, Jiya
Zhang, Feng
Liu, Weifeng
He, Bingsheng
Wu, Ruofan
Du, Xiaoyong
Wang, Rujia
PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
[4] Automatic Selection of Sparse Triangular Linear System Solvers on GPUs through Machine Learning Techniques
Dufrechou, Ernesto
Ezzatti, Pablo
Quintana-Orti, Enrique S.
2019 31ST INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2019), 2019, : 41 - 47

← 1 →