A NOVEL PARALLEL QR ALGORITHM FOR HYBRID DISTRIBUTED MEMORY HPC SYSTEMS

被引：18

作者：

Granat, Robert ^{[1
,2
]}

Kagstrom, Bo ^{[1
,2
]}

Kressner, Daniel ^{[3
]}

机构：

[1] Umea Univ, Dept Comp Sci, SE-90187 Umea, Sweden

[2] Umea Univ, HPC2N, SE-90187 Umea, Sweden

[3] ETH, Seminar Appl Math, CH-8092 Zurich, Switzerland

来源：

SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2010年 / 32卷 / 04期

基金：

瑞典研究理事会;

关键词：

eigenvalue problem; nonsymmetric QR algorithm; multishift; bulge chasing; parallel computations; level; 3; performance; aggressive early deflation; parallel algorithms; hybrid distributed memory systems; ALGEBRAIC RICCATI EQUATION; LEVEL; 3; BLAS; MATRIX; REDUCTION; DEFLATION; SOFTWARE; SHIFTS;

D O I：

10.1137/090756934

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing systems is presented. For this purpose, we introduce the concept of multiwindow bulge chain chasing and parallelize aggressive early deflation. The multiwindow approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications.

引用

页码：2345 / 2378

页数：34

共 50 条

[41] A hybrid parallel Delaunay image-to-mesh conversion algorithm scalable on distributed-memory clusters
Feng, Daming
Chernikov, Andrey N.
Chrisochoides, Nikos P.
COMPUTER-AIDED DESIGN, 2018, 103 : 34 - 46
[42] Distributed parallel volume rendering on shared memory systems
Hancock, D.J.
Hubbold, R.J.
Future Generation Computer Systems, 1998, 13 (4-5): : 251 - 259
[43] Numerical integration on distributed-memory parallel systems
Ciegis, R
Sablinskas, R
Wasniewski, J
RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 1997, 1332 : 329 - 336
[44] Distributed parallel volume rendering on shared memory systems
Hancock, DJ
Hubbold, RJ
HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1997, 1225 : 157 - 164
[45] Distributed parallel volume rendering on shared memory systems
Hancock, DJ
Hubbold, RJ
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 1998, 13 (4-5): : 251 - 259
[46] Analysis of the impact of memory in distributed parallel processing systems
Peris, Vinod G.J.
Squillante, Mark S.
Naik, Vijay K.
Performance Evaluation Review, 1994, 22 (01): : 5 - 18
[47] HPC optimal parallel communication algorithm for the simulation of fractional-order systems
C. Bonchiş
E. Kaslik
F. Roşu
The Journal of Supercomputing, 2019, 75 : 1014 - 1025
[48] HPC optimal parallel communication algorithm for the simulation of fractional-order systems
Bonchis, C.
Kaslik, E.
Rosu, F.
JOURNAL OF SUPERCOMPUTING, 2019, 75 (03): : 1014 - 1025
[49] TDR: A distributed-memory parallel routing algorithm for FPGAs
Cabral, LAF
Aude, RS
Maculan, N
FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS: RECONFIGURABLE COMPUTING IS GOING MAINSTREAM, 2002, 2438 : 263 - 270
[50] A Distributed Memory Parallel Fourth-Order IADEMF Algorithm
Abu Mansor, Noreliza
Zulkifle, Ahmad Kamal
Alias, Norma
Hasan, Mohammad Khatim
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (09) : 599 - 607

← 1 2 3 4 5 →