Multi-GPU Implementation of LU Factorization

被引:8
|
作者
Jia, Yulu [1 ]
Luszczek, Piotr [1 ]
Dongarra, Jack [1 ]
机构
[1] Univ Tennessee, Knoxville, TN 37996 USA
关键词
LU factorization; hardware accelerators; hybrid; multi-core multi-GPU; MODEL; SET;
D O I
10.1016/j.procs.2012.04.012
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
LU factorization is the most computationally intensive step in solving systems of linear equations. By obtaining first the LU factorization of the coefficient matrix, we then may readily solve the system using backward substitution. The computational cost of LU factorization in terms floating point operations is cubic. There are various efforts to improve the performance of LU factorization. We propose a multi-core multi-GPU hybrid LU factorization algorithm that leverages the strengths of both multiple CPUs and multiple GPUs. Our algorithm uses some of the CPU cores for panel factorization, and the rest of the CPU cores together with all the available GPUs for trailing submatrix updates. Our algorithm employs both dynamic scheduling and static scheduling. Experiments show that our approach reaches 1134 Gflop/s with 4 Fermi GPU boards when combined with the total of 48 CPU cores from AMD. This is the first time such level of performance have been reported in a shared memory environment. Execution trace shows that our code also achieves good load balance and high system utilization.
引用
收藏
页码:106 / 115
页数:10
相关论文
共 50 条
  • [1] MAPREDUCE IMPLEMENTATION WITH MULTI-GPU
    Chen, Yi
    Chen, Su
    Jiang, Hai
    [J]. INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE & TECHNOLOGY: PROCEEDINGS, 2012, : 21 - 25
  • [2] Efficient Implementation of MrBayes on Multi-GPU
    Bao, Jie
    Xia, Hongju
    Zhou, Jianfu
    Liu, Xiaoguang
    Wang, Gang
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (06) : 1471 - 1479
  • [3] Scalable multi-GPU implementation of the MAGFLOW simulator
    Rustico, Eugenio
    Bilotta, Giuseppe
    Herault, Alexis
    Del Negro, Ciro
    Gallo, Giovanni
    [J]. ANNALS OF GEOPHYSICS, 2011, 54 (05) : 592 - 599
  • [4] Towards a Multi-GPU Implementation of a Seismic Application
    Rigon, Pedro H. C.
    Schussler, Brenda S.
    Padoin, Edson L.
    Lorenzon, Arthur F.
    Carissimi, Alexandre
    Navaux, Philippe O. A.
    [J]. HIGH PERFORMANCE COMPUTING, CARLA 2023, 2024, 1887 : 146 - 159
  • [5] Multi-GPU Implementation of the NICAM Atmospheric Model
    Demeshko, Irina
    Maruyama, Naoya
    Tomita, Hirofumi
    Matsuoka, Satoshi
    [J]. EURO-PAR 2012: PARALLEL PROCESSING WORKSHOPS, 2013, 7640 : 175 - 184
  • [6] Multi-GPU implementation of the lattice Boltzmann method
    Obrecht, Christian
    Kuznik, Frederic
    Tourancheau, Bernard
    Roux, Jean-Jacques
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 65 (02) : 252 - 261
  • [7] A Multi-GPU Implementation of a Cellular Genetic Algorithm
    Vidal, Pablo
    Alba, Enrique
    [J]. 2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
  • [8] A Multi-GPU PCISPH Implementation with Efficient Memory Transfers
    Verma, Kevin
    Peng, Chong
    Szewc, Kamil
    Wille, Robert
    [J]. 2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
  • [9] Benchmarking multi-GPU applications on modern multi-GPU integrated systems
    Bernaschi, Massimo
    Agostini, Elena
    Rossetti, Davide
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (14):
  • [10] Multi-GPU Implementation of k-Nearest Neighbor Algorithm
    Masek, Jan
    Burget, Kadim
    Karasek, Jan
    Uher, Vaclav
    Dutta, Malay Kishore
    [J]. 2015 38TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2015, : 764 - 767