Parallel ILU preconditioners in GPU computation

被引:6
|
作者
Chen, Yan [3 ]
Tian, Xuhong [3 ]
Liu, Hui [1 ]
Chen, Zhangxin [1 ]
Yang, Bo [1 ]
Liao, Wenyuan [2 ]
Zhang, Peng [4 ]
He, Ruijian [1 ]
Yang, Min [1 ]
机构
[1] Univ Calgary, Dept Chem & Petr Engn, Calgary, AB T2N 1N4, Canada
[2] Univ Calgary, Dept Math & Stat, Calgary, AB T2N 1N4, Canada
[3] South China Agr Univ, Coll Math & Informat, Guangzhou 510642, Guangdong, Peoples R China
[4] SUNY Stony Brook, Dept Biomed Engn, Stony Brook, NY 11794 USA
基金
加拿大自然科学与工程研究理事会;
关键词
ILU; Block-wise matrix; Parallel computing; GPU; Preconditioner;
D O I
10.1007/s00500-017-2764-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Accelerating large-scale linear solvers is always crucial for scientific research and industrial applications. In this regard, preconditioners play a key role in improving the performance of iterative linear solvers. This paper presents a summary and review of our work about the development of parallel ILU preconditioners on GPUs. The mechanisms of ILU(0), ILU(k), ILUT, enhanced ILUT, and block-wise ILU(k) are reviewed and analyzed, which give a clear guidance in the development of iterative linear solvers. ILU(0) is the most commonly used preconditioner, and the nonzero pattern of its matrix is exactly the same as the original matrix to be solved. ILU(k) uses k levels to control the pattern of its preconditioner matrix. ILUT selects entries for its preconditioner matrix by setting thresholds without considering its original matrix pattern. In addition to point-wise ILU preconditioners, a block-wise ILU(k) preconditioner is designed delicately in support of block-wise matrices. In implementation, the RAS (Restricted Additive Schwarz) method is adopted to optimize the parallel structure of a preconditioner matrix. Coupling with the configuration parameters of ILU preconditioners, a complex situation appears in the parallel solution process, so decoupled algorithms are adopted. These algorithms are implemented and tested on NVIDIA GPUs. The experiment results show that a single-GPU implementation can speed up an ILU preconditioner by a factor of 10, compared to traditional CPU implementation. The results also show that the ILU(0) has better speedup than ILU(k) but slower convergence than ILU(k). Level k of ILU(k) and threshold (p,t) of ILUT are effective adjustment factors for controlling the equilibrium point between acceleration and convergence for ILU(k) and ILUT, respectively. All these ILU preconditioners are characterized and compared in this work, which shows a clear picture and numerical insights for practitioners in the ILU family.
引用
收藏
页码:8187 / 8205
页数:19
相关论文
共 50 条
  • [1] Parallel ILU preconditioners in GPU computation
    Yan Chen
    Xuhong Tian
    Hui Liu
    Zhangxin Chen
    Bo Yang
    Wenyuan Liao
    Peng Zhang
    Ruijian He
    Min Yang
    [J]. Soft Computing, 2018, 22 : 8187 - 8205
  • [2] Design, Tuning and Evaluation of Parallel Multilevel ILU Preconditioners
    Aliaga, Jose I.
    Bollhoefer, Matthias
    Martin, Alberto F.
    Quintana-Orti, Enrique S.
    [J]. HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2008, 2008, 5336 : 314 - +
  • [3] Distributed block independent set algorithms and parallel multilevel ILU preconditioners
    Shen, C
    Zhang, J
    Wang, K
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2005, 65 (03) : 331 - 346
  • [4] Parallel performance of block ILU preconditioners for a block-tridiagonal matrix
    Yun, JH
    [J]. JOURNAL OF SUPERCOMPUTING, 2003, 24 (01): : 69 - 89
  • [5] Block and full matrix ILU preconditioners for parallel finite element solvers
    Wille, SO
    Staff, O
    Loula, AFD
    [J]. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2002, 191 (13-14) : 1381 - 1394
  • [6] Parallel Performance of Block ILU Preconditioners for a Block-tridiagonal Matrix
    Jae Heon Yun
    [J]. The Journal of Supercomputing, 2003, 24 (1) : 69 - 89
  • [7] Performance comparison of parallel ILU preconditioners for the incompressible Navier-Stokes equations
    Sungwoo Kang
    Long Cu Ngo
    Hyounggwon Choi
    Wanjin Chung
    Yo-Han Yoo
    Jung Yul Yoo
    [J]. Journal of Mechanical Science and Technology, 2020, 34 : 1175 - 1184
  • [8] Performance comparison of parallel ILU preconditioners for the incompressible Navier-Stokes equations
    Kang, Sungwoo
    Ngo, Long Cu
    Choi, Hyounggwon
    Chung, Wanjin
    Yoo, Yo-Han
    Yoo, Jung Yul
    [J]. JOURNAL OF MECHANICAL SCIENCE AND TECHNOLOGY, 2020, 34 (03) : 1175 - 1184
  • [9] Experimental study of ILU preconditioners for indefinite matrices
    Chow, E
    Saad, Y
    [J]. JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 1997, 86 (02) : 387 - 414
  • [10] The Gravity Parallel Computation Based on GPU
    Wang Kefan
    Li Ge
    [J]. PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2409 - 2413