AG-SpTRSV: An Automatic Framework to Optimize Sparse Triangular Solve on GPUs

被引:0
|
作者
Hu, Zhengding [1 ]
Sun, Jingwei [1 ]
Li, Zhongyang [1 ]
Sun, Guangzhong [1 ]
机构
[1] Univ Sci & Technol China, Comp Sci & Technol, Hefei, Peoples R China
关键词
Sparse matrix; triangular solve; automatic optimization; GPU; SYSTEM SOLVERS; SELECTION;
D O I
10.1145/3674911
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse Triangular Solve (SpTRSV) has long been an essential kernel in the field of scientific computing. Due to its low computational intensity and internal data dependencies, SpTRSV is hard to implement and optimize on graphics processing units (GPUs). Based on our experimental observations, existing implementations on GPUs fail to achieve the optimal performance due to their suboptimal parallelism setups and code implementations plus lack of consideration of the irregular data distribution. Moreover, their algorithm design lacks the adaptability to different input matrices, which may involve substantial manual efforts of algorithm redesigning and parameter tuning for performance consistency. In this work, we propose AG-SpTRSV, an automatic framework to optimize SpTRSV on GPUs, which provides high performance on various matrices while eliminating the costs of manual design. AG-SpTRSV abstracts the procedures of optimizing an SpTRSV kernel as a scheme and constructs a comprehensive optimization space based on it. By defining a unified code template and preparing code variants, AG-SpTRSV enables fine-grained dynamic parallelism and adaptive code optimizations to handle various tasks. Through computation graph transformation and multi-hierarchy heuristic scheduling, AG-SpTRSV generates schemes for task partitioning and mapping, which effectively address the issues of irregular data distribution and internal data dependencies. AG-SpTRSV searches for the best scheme to optimize the target kernel for the specific matrix. A learned lightweight performance model is also introduced to reduce search costs and provide an efficient end-to-end solution. Experimental results with SuiteSparse Matrix Collection on NVIDIA Tesla A100 and RTX 3080 Ti show that AG-SpTRSV outperforms state-of-the-art implementations with geometric average speedups of 2.12x similar to 3.99x. With the performance model enabled, AG-SpTRSV can provide an efficient end-to-end solution, with preprocessing times ranging from 3.4 to 245 times of the execution time.
引用
收藏
页数:25
相关论文
共 4 条
  • [1] TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs
    Lu, Zhengyang
    Liu, Weifeng
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2023, 5 (02) : 129 - 143
  • [2] TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs
    Zhengyang Lu
    Weifeng Liu
    CCF Transactions on High Performance Computing, 2023, 5 : 129 - 143
  • [3] CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs
    Su, Jiya
    Zhang, Feng
    Liu, Weifeng
    He, Bingsheng
    Wu, Ruofan
    Du, Xiaoyong
    Wang, Rujia
    PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [4] Automatic Selection of Sparse Triangular Linear System Solvers on GPUs through Machine Learning Techniques
    Dufrechou, Ernesto
    Ezzatti, Pablo
    Quintana-Orti, Enrique S.
    2019 31ST INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2019), 2019, : 41 - 47