A NOVEL PARALLEL QR ALGORITHM FOR HYBRID DISTRIBUTED MEMORY HPC SYSTEMS

被引:18
|
作者
Granat, Robert [1 ,2 ]
Kagstrom, Bo [1 ,2 ]
Kressner, Daniel [3 ]
机构
[1] Umea Univ, Dept Comp Sci, SE-90187 Umea, Sweden
[2] Umea Univ, HPC2N, SE-90187 Umea, Sweden
[3] ETH, Seminar Appl Math, CH-8092 Zurich, Switzerland
来源
SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2010年 / 32卷 / 04期
基金
瑞典研究理事会;
关键词
eigenvalue problem; nonsymmetric QR algorithm; multishift; bulge chasing; parallel computations; level; 3; performance; aggressive early deflation; parallel algorithms; hybrid distributed memory systems; ALGEBRAIC RICCATI EQUATION; LEVEL; 3; BLAS; MATRIX; REDUCTION; DEFLATION; SOFTWARE; SHIFTS;
D O I
10.1137/090756934
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing systems is presented. For this purpose, we introduce the concept of multiwindow bulge chain chasing and parallelize aggressive early deflation. The multiwindow approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications.
引用
收藏
页码:2345 / 2378
页数:34
相关论文
共 50 条
  • [11] Multi-step Parallel PNN Algorithm for Distributed-Memory Systems
    Wakatani, Akiyoshi
    COMPUTER AND INFORMATION SCIENCE, 2008, 131 : 41 - 50
  • [12] A distributed memory parallel Gauss-Seidel algorithm for linear algebraic systems
    Shang, Yueqiang
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2009, 57 (08) : 1369 - 1376
  • [13] THE PARALLEL QR FACTORIZATION ALGORITHM FOR TRIDIAGONAL LINEAR-SYSTEMS
    AMODIO, P
    BRUGNANO, L
    PARALLEL COMPUTING, 1995, 21 (07) : 1097 - 1110
  • [14] Implementation of QR and LQ Decompositions on Shared Memory Parallel Computing Systems
    Egunov, V
    Andreev, A.
    2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING, APPLICATIONS AND MANUFACTURING (ICIEAM), 2016,
  • [15] PANEL - PARALLEL AND DISTRIBUTED COMPUTING DISTRIBUTED MEMORY OR SHARED MEMORY-SYSTEMS
    REIJNS, GL
    IFIP TRANSACTIONS A-COMPUTER SCIENCE AND TECHNOLOGY, 1992, 12 : 543 - 544
  • [16] Modeling Memory Contention between Communications and Computations in Distributed HPC Systems
    Denis, Alexandre
    Jeannot, Emmanuel
    Swartvagher, Philippe
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 476 - 485
  • [17] Parallel algorithm for block-tridiagonal linear systems on distributed-memory multicomputers
    Luo, Zhi-Gang
    Li, Xiao-Mei
    Jisuanji Xuebao/Chinese Journal of Computers, 2000, 23 (10): : 1028 - 1034
  • [18] A distributed memory parallel algorithm for the efficient computation of sensitivities of differential-algebraic systems
    Keeping, BR
    Pantelides, CC
    MATHEMATICS AND COMPUTERS IN SIMULATION, 1998, 44 (06) : 545 - 558
  • [19] PARALLEL ANNEALING ON DISTRIBUTED-MEMORY SYSTEMS
    LEE, FH
    STILES, GS
    SWAMINATHAN, V
    PROGRAMMING AND COMPUTER SOFTWARE, 1995, 21 (01) : 1 - 8
  • [20] Parallel implementation of a ray tracing algorithm for distributed memory parallel computers
    Lee, TY
    Raghavendra, CS
    Nicholas, JB
    CONCURRENCY-PRACTICE AND EXPERIENCE, 1997, 9 (10): : 947 - 965