A NOVEL PARALLEL QR ALGORITHM FOR HYBRID DISTRIBUTED MEMORY HPC SYSTEMS

被引：18

作者：

Granat, Robert ^{[1
,2
]}

Kagstrom, Bo ^{[1
,2
]}

Kressner, Daniel ^{[3
]}

机构：

[1] Umea Univ, Dept Comp Sci, SE-90187 Umea, Sweden

[2] Umea Univ, HPC2N, SE-90187 Umea, Sweden

[3] ETH, Seminar Appl Math, CH-8092 Zurich, Switzerland

来源：

SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2010年 / 32卷 / 04期

基金：

瑞典研究理事会;

关键词：

eigenvalue problem; nonsymmetric QR algorithm; multishift; bulge chasing; parallel computations; level; 3; performance; aggressive early deflation; parallel algorithms; hybrid distributed memory systems; ALGEBRAIC RICCATI EQUATION; LEVEL; 3; BLAS; MATRIX; REDUCTION; DEFLATION; SOFTWARE; SHIFTS;

D O I：

10.1137/090756934

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing systems is presented. For this purpose, we introduce the concept of multiwindow bulge chain chasing and parallelize aggressive early deflation. The multiwindow approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications.

引用

页码：2345 / 2378

页数：34

共 50 条

[11] Multi-step Parallel PNN Algorithm for Distributed-Memory Systems
Wakatani, Akiyoshi
COMPUTER AND INFORMATION SCIENCE, 2008, 131 : 41 - 50
[12] A distributed memory parallel Gauss-Seidel algorithm for linear algebraic systems
Shang, Yueqiang
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2009, 57 (08) : 1369 - 1376
[13] THE PARALLEL QR FACTORIZATION ALGORITHM FOR TRIDIAGONAL LINEAR-SYSTEMS
AMODIO, P
BRUGNANO, L
PARALLEL COMPUTING, 1995, 21 (07) : 1097 - 1110
[14] Implementation of QR and LQ Decompositions on Shared Memory Parallel Computing Systems
Egunov, V
Andreev, A.
2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING, APPLICATIONS AND MANUFACTURING (ICIEAM), 2016,
[15] PANEL - PARALLEL AND DISTRIBUTED COMPUTING DISTRIBUTED MEMORY OR SHARED MEMORY-SYSTEMS
REIJNS, GL
IFIP TRANSACTIONS A-COMPUTER SCIENCE AND TECHNOLOGY, 1992, 12 : 543 - 544
[16] Modeling Memory Contention between Communications and Computations in Distributed HPC Systems
Denis, Alexandre
Jeannot, Emmanuel
Swartvagher, Philippe
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 476 - 485
[17] Parallel algorithm for block-tridiagonal linear systems on distributed-memory multicomputers
Luo, Zhi-Gang
Li, Xiao-Mei
Jisuanji Xuebao/Chinese Journal of Computers, 2000, 23 (10): : 1028 - 1034
[18] A distributed memory parallel algorithm for the efficient computation of sensitivities of differential-algebraic systems
Keeping, BR
Pantelides, CC
MATHEMATICS AND COMPUTERS IN SIMULATION, 1998, 44 (06) : 545 - 558
[19] PARALLEL ANNEALING ON DISTRIBUTED-MEMORY SYSTEMS
LEE, FH
STILES, GS
SWAMINATHAN, V
PROGRAMMING AND COMPUTER SOFTWARE, 1995, 21 (01) : 1 - 8
[20] Parallel implementation of a ray tracing algorithm for distributed memory parallel computers
Lee, TY
Raghavendra, CS
Nicholas, JB
CONCURRENCY-PRACTICE AND EXPERIENCE, 1997, 9 (10): : 947 - 965

← 1 2 3 4 5 →