Massively Parallel Polar Decomposition on Distributed-memory Systems

被引:3
|
作者
Ltaief, Hatem [1 ]
Sukkari, Dalal [1 ]
Esposito, Aniello [2 ]
Nakatsukasa, Yuji [3 ]
Keyes, David [1 ]
机构
[1] King Abdullah Univ Sci & Technol, Extreme Comp Res Ctr, 4700 King Abdullah Blvd, Jeddah 23955, Saudi Arabia
[2] Cray EMEA Res Lab, Bristol, Avon, England
[3] Univ Oxford, Math Inst, Oxford, England
关键词
Polar decomposition; Zolotarev functions; parallel algorithms; strong scaling; distributed-memory systems; ITERATION; ALGORITHMS;
D O I
10.1145/3328723
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present a high-performance implementation of the Polar Decomposition (PD) on distributed-memory systems. Building upon on the QR-based Dynamically Weighted Halley (QDWH) algorithm, the key idea lies in finding the best rational approximation for the scalar sign function, which also corresponds to the polar factor for symmetric matrices, to further accelerate the QDWH convergence. Based on the Zolotarev rational functions-introduced by Zolotarev (ZOLO) in 1877-this new PD algorithm ZOLO-PD converges within two iterations even for ill-conditioned matrices, instead of the original six iterations needed for QDWH. ZOLO-PD uses the property of Zolotarev functions that optimality is maintained when two functions are composed in an appropriate manner. The resulting ZOLO-PD has a convergence rate up to 17, in contrast to the cubic convergence rate for QDWH. This comes at the price of higher arithmetic costs and memory footprint. These extra floating-point operations can, however, be processed in an embarrassingly parallel fashion. We demonstrate performance using up to 102,400 cores on two supercomputers. We demonstrate that, in the presence of a large number of processing units, ZOLO-PD is able to outperform QDWH by up to 2.3x speedup, especially in situations where QDWH runs out of work, for instance, in the strong scaling mode of operation.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] PARALLEL ANNEALING ON DISTRIBUTED-MEMORY SYSTEMS
    LEE, FH
    STILES, GS
    SWAMINATHAN, V
    [J]. PROGRAMMING AND COMPUTER SOFTWARE, 1995, 21 (01) : 1 - 8
  • [2] Numerical integration on distributed-memory parallel systems
    Ciegis, R
    Sablinskas, R
    Wasniewski, J
    [J]. RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 1997, 1332 : 329 - 336
  • [3] Efficient Breadth-First Search on Massively Parallel and Distributed-Memory Machines
    Ueno K.
    Suzumura T.
    Maruyama N.
    Fujisawa K.
    Matsuoka S.
    [J]. Data Science and Engineering, 2017, 2 (1) : 22 - 35
  • [4] New parallel scheduling algorithm on distributed-memory systems
    Lu, G.H.
    Sun, S.X.
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2001, 38 (02):
  • [5] PARALLEL SOLUTION OF TRIANGULAR SYSTEMS ON DISTRIBUTED-MEMORY MULTIPROCESSORS
    HEATH, MT
    ROMINE, CH
    [J]. SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1988, 9 (03): : 558 - 588
  • [6] Distributed-Memory Parallel JointNMF
    Eswar, Srinivas
    Cobb, Benjamin
    Hayashi, Koby
    Kannan, Ramakrishnan
    Ballard, Grey
    Vuduc, Richard
    Park, Haesun
    [J]. PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023, 2023, : 301 - 312
  • [7] Parallel H-matrix arithmetic on distributed-memory systems
    Izadi, Mohammad
    [J]. COMPUTING AND VISUALIZATION IN SCIENCE, 2012, 15 (02) : 87 - 97
  • [8] Processor allocation in multiprogrammed distributed-memory parallel computer systems
    Naik, VK
    Setia, SK
    Squillante, MS
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1997, 46 (01) : 28 - 47
  • [9] Parallel ILP for distributed-memory architectures
    Nuno A. Fonseca
    Ashwin Srinivasan
    Fernando Silva
    Rui Camacho
    [J]. Machine Learning, 2009, 74 : 257 - 279
  • [10] COMPUTATION MIGRATION - ENHANCING LOCALITY FOR DISTRIBUTED-MEMORY PARALLEL SYSTEMS
    HSIEH, WC
    WANG, P
    WEIHL, WE
    [J]. SIGPLAN NOTICES, 1993, 28 (07): : 239 - 248