Multi-GPU Implementation of LU Factorization

被引：8

作者：

Jia, Yulu ^{[1
]}

Luszczek, Piotr ^{[1
]}

Dongarra, Jack ^{[1
]}

机构：

[1] Univ Tennessee, Knoxville, TN 37996 USA

来源：

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012 | 2012年 / 9卷

关键词：

LU factorization; hardware accelerators; hybrid; multi-core multi-GPU; MODEL; SET;

D O I：

10.1016/j.procs.2012.04.012

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

LU factorization is the most computationally intensive step in solving systems of linear equations. By obtaining first the LU factorization of the coefficient matrix, we then may readily solve the system using backward substitution. The computational cost of LU factorization in terms floating point operations is cubic. There are various efforts to improve the performance of LU factorization. We propose a multi-core multi-GPU hybrid LU factorization algorithm that leverages the strengths of both multiple CPUs and multiple GPUs. Our algorithm uses some of the CPU cores for panel factorization, and the rest of the CPU cores together with all the available GPUs for trailing submatrix updates. Our algorithm employs both dynamic scheduling and static scheduling. Experiments show that our approach reaches 1134 Gflop/s with 4 Fermi GPU boards when combined with the total of 48 CPU cores from AMD. This is the first time such level of performance have been reported in a shared memory environment. Execution trace shows that our code also achieves good load balance and high system utilization.

引用

页码：106 / 115

页数：10

共 50 条

[1] MAPREDUCE IMPLEMENTATION WITH MULTI-GPU
Chen, Yi
Chen, Su
Jiang, Hai
[J]. INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE & TECHNOLOGY: PROCEEDINGS, 2012, : 21 - 25
[2] Efficient Implementation of MrBayes on Multi-GPU
Bao, Jie
Xia, Hongju
Zhou, Jianfu
Liu, Xiaoguang
Wang, Gang
[J]. MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (06) : 1471 - 1479
[3] Scalable multi-GPU implementation of the MAGFLOW simulator
Rustico, Eugenio
Bilotta, Giuseppe
Herault, Alexis
Del Negro, Ciro
Gallo, Giovanni
[J]. ANNALS OF GEOPHYSICS, 2011, 54 (05) : 592 - 599
[4] Towards a Multi-GPU Implementation of a Seismic Application
Rigon, Pedro H. C.
Schussler, Brenda S.
Padoin, Edson L.
Lorenzon, Arthur F.
Carissimi, Alexandre
Navaux, Philippe O. A.
[J]. HIGH PERFORMANCE COMPUTING, CARLA 2023, 2024, 1887 : 146 - 159
[5] Multi-GPU Implementation of the NICAM Atmospheric Model
Demeshko, Irina
Maruyama, Naoya
Tomita, Hirofumi
Matsuoka, Satoshi
[J]. EURO-PAR 2012: PARALLEL PROCESSING WORKSHOPS, 2013, 7640 : 175 - 184
[6] Multi-GPU implementation of the lattice Boltzmann method
Obrecht, Christian
Kuznik, Frederic
Tourancheau, Bernard
Roux, Jean-Jacques
[J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 65 (02) : 252 - 261
[7] A Multi-GPU Implementation of a Cellular Genetic Algorithm
Vidal, Pablo
Alba, Enrique
[J]. 2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
[8] A Multi-GPU PCISPH Implementation with Efficient Memory Transfers
Verma, Kevin
Peng, Chong
Szewc, Kamil
Wille, Robert
[J]. 2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
[9] Benchmarking multi-GPU applications on modern multi-GPU integrated systems
Bernaschi, Massimo
Agostini, Elena
Rossetti, Davide
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (14):
[10] Multi-GPU Implementation of k-Nearest Neighbor Algorithm
Masek, Jan
Burget, Kadim
Karasek, Jan
Uher, Vaclav
Dutta, Malay Kishore
[J]. 2015 38TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2015, : 764 - 767

← 1 2 3 4 5 →