A NOVEL PARALLEL QR ALGORITHM FOR HYBRID DISTRIBUTED MEMORY HPC SYSTEMS

被引：18

作者：

Granat, Robert ^{[1
,2
]}

Kagstrom, Bo ^{[1
,2
]}

Kressner, Daniel ^{[3
]}

机构：

[1] Umea Univ, Dept Comp Sci, SE-90187 Umea, Sweden

[2] Umea Univ, HPC2N, SE-90187 Umea, Sweden

[3] ETH, Seminar Appl Math, CH-8092 Zurich, Switzerland

来源：

SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2010年 / 32卷 / 04期

基金：

瑞典研究理事会;

关键词：

eigenvalue problem; nonsymmetric QR algorithm; multishift; bulge chasing; parallel computations; level; 3; performance; aggressive early deflation; parallel algorithms; hybrid distributed memory systems; ALGEBRAIC RICCATI EQUATION; LEVEL; 3; BLAS; MATRIX; REDUCTION; DEFLATION; SOFTWARE; SHIFTS;

D O I：

10.1137/090756934

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing systems is presented. For this purpose, we introduce the concept of multiwindow bulge chain chasing and parallelize aggressive early deflation. The multiwindow approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications.

引用

页码：2345 / 2378

页数：34

共 50 条

[31] Parallel algorithm design on some distributed systems
Jiachang Sun
Xuebin Chi
Jianwen Cao
Linbo Zhang
Journal of Computer Science and Technology, 1997, 12 (2) : 97 - 104
[32] A novel parallel algorithm for large-scale Fock matrix construction with small locally distributed memory architectures:: RT parallel algorithm
Takashima, H
Yamada, S
Obara, S
Kitamura, K
Inabata, S
Miyakawa, N
Tanabe, K
Nagashima, U
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2002, 23 (14) : 1337 - 1346
[33] ecoHMEM: Improving Object Placement Methodology for Hybrid Memory Systems in HPC
Jorda, Marc
Rai, Siddharth
Ayguade, Eduard
Labarta, Jesus
Pena, Antonio J.
2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 278 - 288
[34] A Hybrid Parallel Algorithm for the Auction Algorithm in Multicore Systems
Nascimento, A. P.
Vasconcelos, C. N.
Jamel, F. S.
Sena, A. C.
2016 28TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW), 2016, : 73 - 78
[35] A distributed shared parallel IO system for HPC
Guo Yu-Feng
Li Qiong
Liu Guang-Ming
Cao Yue-Sheng
Zhang Lei
PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2008, : 229 - 234
[36] Parallel Implementation of a Low Order Algorithm for Dynamics of Multibody Systems on a Distributed Memory Computing System
S. Duan
K.S. Anderson
Engineering with Computers, 2000, 16 : 96 - 108
[37] Parallel implementation of a low order algorithm for dynamics of multibody systems on a distributed memory computing system
Duan, S
Anderson, KS
ENGINEERING WITH COMPUTERS, 2000, 16 (02) : 96 - 108
[38] An improved parallel algorithm for certain Toeplitz cyclic tridiagonal systems on distributed-memory multicomputer
Zhang, XB
Luo, ZG
Li, XM
ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2003, 2834 : 292 - 300
[39] A Hybrid Parallel Delaunay Image-to-Mesh Conversion Algorithm Scalable on Distributed-Memory Clusters
Feng, Daming
Chernikov, Andrey N.
Chrisochoides, Nikos P.
25TH INTERNATIONAL MESHING ROUNDTABLE, 2016, 163 : 59 - 71
[40] HY-DBSCAN: A hybrid parallel DBSCAN clustering algorithm scalable on distributed-memory computers
Wu, Guoqing
Cao, Liqiang
Tian, Hongyun
Wang, Wei
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2022, 168 : 57 - 69

← 1 2 3 4 5 →