A multi-GPU parallel optimization model for the preconditioned conjugate gradient algorithm

被引:17
|
作者
Gao, Jiaquan [1 ,4 ]
Zhou, Yuanshen [2 ]
He, Guixia [3 ]
Xia, Yifei [1 ]
机构
[1] Nanjing Normal Univ, Sch Comp Sci & Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] Zhejiang Univ Technol, Coll Comp Sci & Technol, Hangzhou 310023, Zhejiang, Peoples R China
[3] Zhejiang Univ Technol, Zhijiang Coll, Hangzhou 310024, Zhejiang, Peoples R China
[4] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China
关键词
Optimization model; Preconditioned conjugate gradient algorithm; CUDA; Multiple GPUs; MATRIX-VECTOR MULTIPLICATION; PERFORMANCE; SOLVERS; SPMV;
D O I
10.1016/j.parco.2017.04.003
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this study, we present a novel optimization model that can automatically and rapidly generate an optimally parallel preconditioned conjugate gradient (PCG) algorithm for any given linear system on a specific multi-graphics processing unit (GPU) platform. For our proposed model, there are the following novelties: (1) a profile-based performance model for each one of the main components of the PCG algorithm, including the vector operation, inner product, and sparse matrix-vector multiplication (SpMV), is suggested, and (2) our model is general, independent of the problems, and only dependent on the resources of devices, and (3) our model is extensible. For a vector operation kernel, or inner product kernel, or SpMV kernel that is not included in our framework, once its performance model is successfully constructed, it can be incorporated into our framework. Our model is constructed only once for each type of GPU. The experiments validate the high efficiency of our proposed model. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 50 条
  • [1] A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform
    Ament, M.
    Knittel, G.
    Weiskopf, D.
    Strasser, W.
    [J]. PROCEEDINGS OF THE 18TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2010, : 583 - 592
  • [2] Parallel preconditioned conjugate gradient algorithm on GPU
    Helfenstein, Rudi
    Koko, Jonas
    [J]. JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2012, 236 (15) : 3584 - 3590
  • [3] Guardband Optimization for the Preconditioned Conjugate Gradient Algorithm
    Lylina, Natalia
    Holst, Stefan
    Jafarzadeh, Hanieh
    Kourfali, Alexandra
    Wunderlich, Hans-Joachim
    [J]. 2023 53RD ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS, DSN-W, 2023, : 195 - 198
  • [4] A Parallel Multi-GPU Clonal Selection Algorithm for Optimization Using OpenCL and OpenMP
    Russo, Igor L. S.
    Bernardino, Heder S.
    Barbosa, Helio J. C.
    [J]. 2016 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2016,
  • [5] A Multi-GPU Parallel Algorithm in Hypersonic Flow Computations
    Lai, Jianqi
    Li, Hua
    Tian, Zhengyu
    Zhang, Ye
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2019, 2019
  • [6] Diagonal Preconditioned Conjugate Gradient Algorithm for Unconstrained Optimization
    Ng, Choong Boon
    Leong, Wah June
    Monsi, Mansor
    [J]. PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2014, 22 (01): : 213 - 224
  • [7] GPU-Centered Parallel Model on Heterogeneous Multi-GPU Clusters
    Wang, Feng
    [J]. PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 1865 - 1868
  • [8] An efficient parallel collaborative filtering algorithm on multi-GPU platform
    Wang, Zhongya
    Liu, Ying
    Chiu, Steve
    [J]. JOURNAL OF SUPERCOMPUTING, 2016, 72 (06): : 2080 - 2094
  • [9] A scaled BFGS preconditioned conjugate gradient algorithm for unconstrained optimization
    Andrei, Neculai
    [J]. APPLIED MATHEMATICS LETTERS, 2007, 20 (06) : 645 - 650
  • [10] An efficient parallel collaborative filtering algorithm on multi-GPU platform
    Zhongya Wang
    Ying Liu
    Steve Chiu
    [J]. The Journal of Supercomputing, 2016, 72 : 2080 - 2094