A NOVEL PARALLEL QR ALGORITHM FOR HYBRID DISTRIBUTED MEMORY HPC SYSTEMS

被引:18
|
作者
Granat, Robert [1 ,2 ]
Kagstrom, Bo [1 ,2 ]
Kressner, Daniel [3 ]
机构
[1] Umea Univ, Dept Comp Sci, SE-90187 Umea, Sweden
[2] Umea Univ, HPC2N, SE-90187 Umea, Sweden
[3] ETH, Seminar Appl Math, CH-8092 Zurich, Switzerland
来源
SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2010年 / 32卷 / 04期
基金
瑞典研究理事会;
关键词
eigenvalue problem; nonsymmetric QR algorithm; multishift; bulge chasing; parallel computations; level; 3; performance; aggressive early deflation; parallel algorithms; hybrid distributed memory systems; ALGEBRAIC RICCATI EQUATION; LEVEL; 3; BLAS; MATRIX; REDUCTION; DEFLATION; SOFTWARE; SHIFTS;
D O I
10.1137/090756934
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing systems is presented. For this purpose, we introduce the concept of multiwindow bulge chain chasing and parallelize aggressive early deflation. The multiwindow approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications.
引用
收藏
页码:2345 / 2378
页数:34
相关论文
共 50 条
  • [31] Parallel algorithm design on some distributed systems
    Jiachang Sun
    Xuebin Chi
    Jianwen Cao
    Linbo Zhang
    Journal of Computer Science and Technology, 1997, 12 (2) : 97 - 104
  • [32] A novel parallel algorithm for large-scale Fock matrix construction with small locally distributed memory architectures:: RT parallel algorithm
    Takashima, H
    Yamada, S
    Obara, S
    Kitamura, K
    Inabata, S
    Miyakawa, N
    Tanabe, K
    Nagashima, U
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2002, 23 (14) : 1337 - 1346
  • [33] ecoHMEM: Improving Object Placement Methodology for Hybrid Memory Systems in HPC
    Jorda, Marc
    Rai, Siddharth
    Ayguade, Eduard
    Labarta, Jesus
    Pena, Antonio J.
    2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 278 - 288
  • [34] A Hybrid Parallel Algorithm for the Auction Algorithm in Multicore Systems
    Nascimento, A. P.
    Vasconcelos, C. N.
    Jamel, F. S.
    Sena, A. C.
    2016 28TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW), 2016, : 73 - 78
  • [35] A distributed shared parallel IO system for HPC
    Guo Yu-Feng
    Li Qiong
    Liu Guang-Ming
    Cao Yue-Sheng
    Zhang Lei
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2008, : 229 - 234
  • [36] Parallel Implementation of a Low Order Algorithm for Dynamics of Multibody Systems on a Distributed Memory Computing System
    S. Duan
    K.S. Anderson
    Engineering with Computers, 2000, 16 : 96 - 108
  • [37] Parallel implementation of a low order algorithm for dynamics of multibody systems on a distributed memory computing system
    Duan, S
    Anderson, KS
    ENGINEERING WITH COMPUTERS, 2000, 16 (02) : 96 - 108
  • [38] An improved parallel algorithm for certain Toeplitz cyclic tridiagonal systems on distributed-memory multicomputer
    Zhang, XB
    Luo, ZG
    Li, XM
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2003, 2834 : 292 - 300
  • [39] A Hybrid Parallel Delaunay Image-to-Mesh Conversion Algorithm Scalable on Distributed-Memory Clusters
    Feng, Daming
    Chernikov, Andrey N.
    Chrisochoides, Nikos P.
    25TH INTERNATIONAL MESHING ROUNDTABLE, 2016, 163 : 59 - 71
  • [40] HY-DBSCAN: A hybrid parallel DBSCAN clustering algorithm scalable on distributed-memory computers
    Wu, Guoqing
    Cao, Liqiang
    Tian, Hongyun
    Wang, Wei
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2022, 168 : 57 - 69