A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver

被引:0
|
作者
Ding, Nan [1 ]
Liu, Yang [2 ]
Williams, Samuel [1 ]
Li, Xiaoye S. [2 ]
机构
[1] Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA
[2] Lawrence Berkeley Natl Lab, Scalable Solvers Grp, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse triangular solve is used in conjunction with Sparse LU for solving sparse linear systems, either as a direct solver or as a preconditioner. As GPUs have become a first-class compute citizen, designing an efficient and scalable SpTRSV on multi-GPU HPC systems is imperative. In this paper, we leverage the advantage of GPU-initiated data transfers of NVSHMEM to implement and evaluate a Multi-GPU SpTRSV. We create a novel producer-consumer paradigm to manage the computation and communication in SpTRSV and implement it using two CUDA streams. Our multi-GPU SpTRSV implementation using CUDA streams achieves a 3.7x speedup when using twelve GPUs (two nodes) relative to our implementation on a single GPU, and up to 6.1x compared to cusparse csrsv2() over the range of one to eighteen GPUs. To further explain the observed performance and explore the key features of matrices to estimate the potential performance benefits when using multi-GPU, we extend the critical path model of SpTRSV to GPUs. We demonstrate the ability of our performance model to understand various aspects of performance and performance bottlenecks on multi-GPU and motivate code optimizations.
引用
收藏
页码:147 / 159
页数:13
相关论文
共 50 条
  • [1] Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures
    Xie, Chenhao
    Chen, Jieyang
    Firoz, Jesun
    Li, Jiajia
    Song, Shuaiwen Leon
    Barker, Kevin
    Raugas, Mark
    Li, Ang
    50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,
  • [2] Parallel Structured Sparse Triangular Solver for GPU Platform
    Chen D.-K.
    Yang C.
    Liu F.-F.
    Ma W.-J.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (11): : 4941 - 4951
  • [3] Adapting a Message-Driven Parallel Application to GPU-Accelerated Clusters
    Phillips, James C.
    Stone, John E.
    Schultent, Klaus
    INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2008, : 444 - +
  • [4] A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster
    Lin, Shaozhong
    Xie, Zhiqiang
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (01): : 433 - 454
  • [5] A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster
    Shaozhong Lin
    Zhiqiang Xie
    The Journal of Supercomputing, 2017, 73 : 433 - 454
  • [6] Solver of Multi-GPU Compressible Turbulence Parallel Simulations Used in Aerodynamic Teaching
    Luo Kai
    Cao Wenbin
    Li Song
    Song Limin
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2016), 2016, 50 : 707 - 710
  • [7] A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform
    Ament, M.
    Knittel, G.
    Weiskopf, D.
    Strasser, W.
    PROCEEDINGS OF THE 18TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2010, : 583 - 592
  • [8] Jade:: A parallel message-driven Java']Java
    DeSouza, J
    Kalé, LV
    COMPUTATIONAL SCIENCE - ICCS 2003, PT III, PROCEEDINGS, 2003, 2659 : 760 - 769
  • [9] A parallel nonlinear multigrid solver for unsteady incompressible flow simulation on multi-GPU cluster
    Shi, Xiaolei
    Agrawal, Tanmay
    Lin, Chao-An
    Hwang, Feng-Nan
    Chiu, Tzu-Hsuan
    JOURNAL OF COMPUTATIONAL PHYSICS, 2020, 414
  • [10] Parallel multi-GPU implementation of fast decoupled power flow solver with hybrid architecture
    Zeng, Lei
    Alawneh, Shadi G.
    Arefifar, Seyed Ali.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (01): : 1125 - 1136