A NOVEL PARALLEL QR ALGORITHM FOR HYBRID DISTRIBUTED MEMORY HPC SYSTEMS

被引:18
|
作者
Granat, Robert [1 ,2 ]
Kagstrom, Bo [1 ,2 ]
Kressner, Daniel [3 ]
机构
[1] Umea Univ, Dept Comp Sci, SE-90187 Umea, Sweden
[2] Umea Univ, HPC2N, SE-90187 Umea, Sweden
[3] ETH, Seminar Appl Math, CH-8092 Zurich, Switzerland
来源
SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2010年 / 32卷 / 04期
基金
瑞典研究理事会;
关键词
eigenvalue problem; nonsymmetric QR algorithm; multishift; bulge chasing; parallel computations; level; 3; performance; aggressive early deflation; parallel algorithms; hybrid distributed memory systems; ALGEBRAIC RICCATI EQUATION; LEVEL; 3; BLAS; MATRIX; REDUCTION; DEFLATION; SOFTWARE; SHIFTS;
D O I
10.1137/090756934
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing systems is presented. For this purpose, we introduce the concept of multiwindow bulge chain chasing and parallelize aggressive early deflation. The multiwindow approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications.
引用
收藏
页码:2345 / 2378
页数:34
相关论文
共 50 条
  • [1] A PARALLEL QZ ALGORITHM FOR DISTRIBUTED MEMORY HPC SYSTEMS
    Adlerborn, Bjoern
    Kagstroem, Bo
    Kressner, Daniel
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (05): : C480 - C503
  • [2] A parallel implementation of the nonsymmetric QR algorithm for distributed memory architectures
    Henry, G
    Watkins, D
    Dongarra, J
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2002, 24 (01): : 284 - 311
  • [3] Parallel external selection algorithm on distributed memory systems
    Zhong, C
    Chen, GL
    Yan, C
    FIFTH INTERNATIONAL CONFERENCE ON ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PROCEEDINGS, 2002, : 243 - 246
  • [4] Parallel algorithm for tridiagonal linear equations for distributed memory systems
    Chi, Lihua
    Li, Xiaomei
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 35 (11): : 1004 - 1107
  • [5] New parallel scheduling algorithm on distributed-memory systems
    Lu, G.H.
    Sun, S.X.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2001, 38 (02):
  • [6] A novel hybrid resampling algorithm for parallel/distributed particle filters
    Zhang, Xudong
    Zhao, Liang
    Zhong, Wei
    Gu, Feng
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 151 : 24 - 37
  • [7] Parallel Lepp-bisection algorithm over distributed memory systems
    Rodriguez, Pedro A.
    Rivara, Maria-Cecilia
    PROCEEDINGS OF 2013 32ND INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2016, : 1 - 3
  • [8] A hybrid shared/distributed memory parallel genetic algorithm for optimization of laminate composites
    Rocha, I. B. C. M.
    Parente, E., Jr.
    Melo, A. M. C.
    COMPOSITE STRUCTURES, 2014, 107 : 288 - 297
  • [9] Power profiling of Cholesky and QR factorizations on distributed memory systems
    Bosilca, George
    Ltaief, Hatem
    Dongarra, Jack
    COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2014, 29 (02): : 139 - 147
  • [10] Parallel QR factorization for hybrid message passing/shared memory operation
    Dunn, IN
    Meyer, GGL
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2001, 338 (05): : 601 - 613