A NOVEL PARALLEL QR ALGORITHM FOR HYBRID DISTRIBUTED MEMORY HPC SYSTEMS

被引:18
|
作者
Granat, Robert [1 ,2 ]
Kagstrom, Bo [1 ,2 ]
Kressner, Daniel [3 ]
机构
[1] Umea Univ, Dept Comp Sci, SE-90187 Umea, Sweden
[2] Umea Univ, HPC2N, SE-90187 Umea, Sweden
[3] ETH, Seminar Appl Math, CH-8092 Zurich, Switzerland
来源
SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2010年 / 32卷 / 04期
基金
瑞典研究理事会;
关键词
eigenvalue problem; nonsymmetric QR algorithm; multishift; bulge chasing; parallel computations; level; 3; performance; aggressive early deflation; parallel algorithms; hybrid distributed memory systems; ALGEBRAIC RICCATI EQUATION; LEVEL; 3; BLAS; MATRIX; REDUCTION; DEFLATION; SOFTWARE; SHIFTS;
D O I
10.1137/090756934
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing systems is presented. For this purpose, we introduce the concept of multiwindow bulge chain chasing and parallelize aggressive early deflation. The multiwindow approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications.
引用
收藏
页码:2345 / 2378
页数:34
相关论文
共 50 条
  • [41] A hybrid parallel Delaunay image-to-mesh conversion algorithm scalable on distributed-memory clusters
    Feng, Daming
    Chernikov, Andrey N.
    Chrisochoides, Nikos P.
    COMPUTER-AIDED DESIGN, 2018, 103 : 34 - 46
  • [42] Distributed parallel volume rendering on shared memory systems
    Hancock, D.J.
    Hubbold, R.J.
    Future Generation Computer Systems, 1998, 13 (4-5): : 251 - 259
  • [43] Numerical integration on distributed-memory parallel systems
    Ciegis, R
    Sablinskas, R
    Wasniewski, J
    RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 1997, 1332 : 329 - 336
  • [44] Distributed parallel volume rendering on shared memory systems
    Hancock, DJ
    Hubbold, RJ
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1997, 1225 : 157 - 164
  • [45] Distributed parallel volume rendering on shared memory systems
    Hancock, DJ
    Hubbold, RJ
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 1998, 13 (4-5): : 251 - 259
  • [46] Analysis of the impact of memory in distributed parallel processing systems
    Peris, Vinod G.J.
    Squillante, Mark S.
    Naik, Vijay K.
    Performance Evaluation Review, 1994, 22 (01): : 5 - 18
  • [47] HPC optimal parallel communication algorithm for the simulation of fractional-order systems
    C. Bonchiş
    E. Kaslik
    F. Roşu
    The Journal of Supercomputing, 2019, 75 : 1014 - 1025
  • [48] HPC optimal parallel communication algorithm for the simulation of fractional-order systems
    Bonchis, C.
    Kaslik, E.
    Rosu, F.
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (03): : 1014 - 1025
  • [49] TDR: A distributed-memory parallel routing algorithm for FPGAs
    Cabral, LAF
    Aude, RS
    Maculan, N
    FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS: RECONFIGURABLE COMPUTING IS GOING MAINSTREAM, 2002, 2438 : 263 - 270
  • [50] A Distributed Memory Parallel Fourth-Order IADEMF Algorithm
    Abu Mansor, Noreliza
    Zulkifle, Ahmad Kamal
    Alias, Norma
    Hasan, Mohammad Khatim
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (09) : 599 - 607