Parallel ILU preconditioners in GPU computation

被引:0
|
作者
Yan Chen
Xuhong Tian
Hui Liu
Zhangxin Chen
Bo Yang
Wenyuan Liao
Peng Zhang
Ruijian He
Min Yang
机构
[1] University of Calgary,Department of Chemical and Petroleum Engineering
[2] University of Calgary,Department of Mathematics and Statistics
[3] South China Agricultural University,College of Mathematics and Informatics
[4] Stony Brook University,Biomedical Engineering Department
来源
Soft Computing | 2018年 / 22卷
关键词
ILU; Block-wise matrix; Parallel computing; GPU; Preconditioner;
D O I
暂无
中图分类号
学科分类号
摘要
Accelerating large-scale linear solvers is always crucial for scientific research and industrial applications. In this regard, preconditioners play a key role in improving the performance of iterative linear solvers. This paper presents a summary and review of our work about the development of parallel ILU preconditioners on GPUs. The mechanisms of ILU(0), ILU(k), ILUT, enhanced ILUT, and block-wise ILU(k) are reviewed and analyzed, which give a clear guidance in the development of iterative linear solvers. ILU(0) is the most commonly used preconditioner, and the nonzero pattern of its matrix is exactly the same as the original matrix to be solved. ILU(k) uses k levels to control the pattern of its preconditioner matrix. ILUT selects entries for its preconditioner matrix by setting thresholds without considering its original matrix pattern. In addition to point-wise ILU preconditioners, a block-wise ILU(k) preconditioner is designed delicately in support of block-wise matrices. In implementation, the RAS (Restricted Additive Schwarz) method is adopted to optimize the parallel structure of a preconditioner matrix. Coupling with the configuration parameters of ILU preconditioners, a complex situation appears in the parallel solution process, so decoupled algorithms are adopted. These algorithms are implemented and tested on NVIDIA GPUs. The experiment results show that a single-GPU implementation can speed up an ILU preconditioner by a factor of 10, compared to traditional CPU implementation. The results also show that the ILU(0) has better speedup than ILU(k) but slower convergence than ILU(k). Level k of ILU(k) and threshold (p, t) of ILUT are effective adjustment factors for controlling the equilibrium point between acceleration and convergence for ILU(k) and ILUT, respectively. All these ILU preconditioners are characterized and compared in this work, which shows a clear picture and numerical insights for practitioners in the ILU family.
引用
收藏
页码:8187 / 8205
页数:18
相关论文
共 50 条
  • [21] Hybrid Multi-Elimination ILU Preconditioners on GPUs
    Lukarski, Dimitar
    Anzt, Hartwig
    Tomov, Stanimire
    Dongarra, Jack
    [J]. PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 7 - 16
  • [22] Computation-free preconditioners for the parallel solution of power system problems
    Dag, H
    Alvarado, FL
    [J]. IEEE TRANSACTIONS ON POWER SYSTEMS, 1997, 12 (02) : 585 - 591
  • [23] GPU Parallel Computation of Morse-Smale Complexes
    Subhash, Varshini
    Pandey, Karran
    Natarajan, Vijay
    [J]. 2020 IEEE VISUALIZATION CONFERENCE - SHORT PAPERS (VIS 2020), 2020, : 36 - 40
  • [24] Work-Efficient Parallel Skyline Computation for the GPU
    Bogh, Kenneth S.
    Chester, Sean
    Assent, Ira
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (09): : 962 - 973
  • [25] A study on the GPU based parallel computation of a projection image
    Lee, Hyunjeong
    Han, Miseon
    Kim, Jeongtae
    [J]. THREE-DIMENSIONAL IMAGING, VISUALIZATION, AND DISPLAY 2017, 2017, 10219
  • [26] Parallel Computation of Ground Radiation Simulation Based on GPU
    Zhao Yanjie
    Yao Guoqing
    Ding Yanqing
    [J]. ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2017, 107 : 9 - 14
  • [27] Comparisons of parallel preconditioners for the computation of interior eigenvalues by a CG-type method on a parallel computer
    Ma, SB
    Jang, HJ
    Kong, E
    [J]. 2002 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS OF THE WORKSHOPS, 2002, : 270 - 273
  • [28] Parallelization of Multilevel ILU Preconditioners on Distributed-Memory Multiprocessors
    Aliaga, Jose I.
    Bollhoefer, Matthias
    Martin, Alberto F.
    Quintana-Orti, Enrique S.
    [J]. APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT I, 2012, 7133 : 162 - 172
  • [29] ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION
    Gupta, Anshul
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2017, 39 (01): : A303 - A332
  • [30] GPU Centric Extensions for Parallel Strongly Connected Components Computation
    Devshatwar, Shrinivas
    Amilkanthwar, Madhur
    Nasre, Rupesh
    [J]. 9TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPUS (GPGPU 9), 2016, : 3 - 12