An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR)

被引:20
|
作者
Huang, He [1 ]
Wang, Liqiang [1 ]
Lee, En-Jui [2 ]
Chen, Po [2 ]
机构
[1] Univ Wyoming, Dept Comp Sci, Laramie, WY 82071 USA
[2] Univ Wyoming, Dept Geol & Geophys, Laramie, WY 82071 USA
基金
美国国家科学基金会;
关键词
Parallel Scientific Computing; LSQR; MPI; GPU; CUDA; CUSPARSE; CUBLAS; Seismic Tomography; Geoscience; GPU;
D O I
10.1016/j.procs.2012.04.009
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
LSQR (Sparse Equations and Least Squares) is a widely used Krylov subspace method to solve large-scale linear systems in seismic tomography. This paper presents a parallel MPI-CUDA implementation for LSQR solver. On CUDA level, our contributions include: (1) utilize CUBLAS and CUSPARSE to compute major steps in LSQR; (2) optimize memory copy between host memory and device memory; (3) develop a CUDA kernel to perform transpose SpMV without transposing the matrix in memory or preserving additional copy. On MPI level, our contributions include: (1) decompose both matrix and vector to increase parallelism; (2) design a static load balancing strategy. In our experiment, the single GPU code achieves up to 17.6x speedup with 15.7 GFlops in single precision and 15.2x speedup with 12.0 GFlops in double precision compared with the original serial CPU code. The MPI-GPU code achieves up to 3.7x speedup with 268 GFlops in single precision and 3.8x speedup with 223 GFlops in double precision on 135 MPI tasks compared with the corresponding MPI-CPU code. The MPI-GPU code scales on both strong and weak scaling tests. In addition, our parallel implementations have better performance than the LSQR subroutine in PETSc library.
引用
收藏
页码:76 / 85
页数:10
相关论文
共 50 条
  • [1] AN MPI-CUDA IMPLEMENTATION FOR THE COMPRESSION OF DEM
    Zeng, Fei
    Liang, Fahong
    Yang, Fan
    Kou, Cheng
    2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 2552 - 2555
  • [2] LSQR - AN ALGORITHM FOR SPARSE LINEAR-EQUATIONS AND SPARSE LEAST-SQUARES
    PAIGE, CC
    SAUNDERS, MA
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1982, 8 (01): : 43 - 71
  • [3] ALGORITHM-583 - LSQR - SPARSE LINEAR-EQUATIONS AND LEAST-SQUARES PROBLEMS
    PAIGE, CC
    SAUNDERS, MA
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1982, 8 (02): : 195 - 209
  • [4] Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm
    Khaled, Heba
    Faheem, Hossam El Deen Mostafa
    El Gohary, Rania
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 12 (03) : 313 - 327
  • [5] Massively parallel modeling of electromagnetic field in conductive media: An MPI-CUDA implementation on Multi-GPU computers
    Tu, Xiaolei
    Bowles-Martinez, Esteban Jeremy
    Schultz, Adam
    COMPUTERS & GEOSCIENCES, 2024, 192
  • [6] A Combined MPI-CUDA Parallel Solution of Linear and Nonlinear Poisson-Boltzmann Equation
    Colmenares, Jose
    Galizia, Antonella
    Ortiz, Jesus
    Clematis, Andrea
    Rocchia, Walter
    BIOMED RESEARCH INTERNATIONAL, 2014, 2014
  • [7] Some properties of LSQR for large sparse linear least squares problems
    Zhongxiao Jia
    Journal of Systems Science and Complexity, 2010, 23 : 815 - 821
  • [8] Some properties of LSQR for large sparse linear least squares problems
    Jia, Zhongxiao
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2010, 23 (04) : 815 - 821
  • [9] Parallel QR Factorization using Givens Rotations in MPI-CUDA for Multi-GPU
    Tapia-Romero, Miguel
    Meneses-Viveros, Amilcar
    Hernandez-Rubio, Erika
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (05) : 636 - 645
  • [10] SOME PROPERTIES OF LSQR FOR LARGE SPARSE LINEAR LEAST SQUARES PROBLEMS
    Zhongxiao JIA Department of Mathematical Sciences
    JournalofSystemsScience&Complexity, 2010, 23 (04) : 815 - 821