PARALLEL SCALABILITY OF THREE-LEVEL FROSch PRECONDITIONERS TO 220000 CORES USING THE THETA SUPERCOMPUTER

被引:2
|
作者
Heinlein, Alexander [1 ]
Rheinbach, Oliver [2 ,3 ]
Roever, Friederike [2 ,3 ]
机构
[1] Delft Univ Technol, Delft Inst Appl Math, Fac Elect Engn Math Comp Sci, Mekelweg 4, NL-2628 CD Delft, Netherlands
[2] Tech Univ Bergakad Freiberg, Fak Math & Informat, Zentrum effiziente Hochtemperatur Stoffwandlun Ze, D-09596 Freiberg, Germany
[3] Tech Univ Bergakad Freiberg, Fak Math & Informat, Univ rechenzentrum URZ, D-09596 Freiberg, Germany
来源
SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2023年 / 45卷 / 03期
关键词
domain decomposition; high performance computing; overlapping Schwarz; software; Trilinos; multilevel preconditioners; DOMAIN DECOMPOSITION; OVERLAPPING SCHWARZ; MULTILEVEL SCHWARZ;
D O I
10.1137/21M1431205
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The parallel performance of the three-level fast and robust overlapping Schwarz (FROSch) preconditioners is investigated for linear elasticity. The FROSch framework is part of the Trilinos software library and contains a parallel implementation of different preconditioners with energy minimizing coarse spaces of generalized Dryja-Smith-Widlund type. The three-level extension is constructed by a recursive application of the FROSch preconditioner to the coarse problem. In this paper, the additional steps in the implementation in order to apply the FROSch preconditioner recursively are described in detail. Furthermore, it is shown that no explicit geometric information is needed in the recursive application of the preconditioner. In particular, the rigid body modes, including the rotations, can be interpolated on the coarse level without additional geometric information. Parallel results for a three-dimensional linear elasticity problem obtained on the Theta supercomputer (Argonne Leadership Computing Facility, Argonne, IL) using up to 220 000 cores are discussed and compared to results obtained on the SuperMUC-NG supercomputer (Leibniz Supercomputing Centre, Garching, Germany). Notably, it can be observed that a hierarchical communication operation in FROSch related to the coarse operator starts to dominate the computing time on Theta, which has a dragonfly interconnect, for 100 000 message passing interface (MPI) ranks or more. The same operation, however, scales well and stays within the order of a second in all experiments performed on SuperMUC-NG, which uses a fat tree network. Using hybrid MPI/OpenMP parallelization, the onset of the MPI communication problem on Theta can be delayed. Further analysis of the performance of FROSch on large supercomputers with dragonfly interconnects will be necessary.
引用
收藏
页码:S173 / S198
页数:26
相关论文
共 50 条
  • [11] Three-level parallel J-Jacobi algorithms for Hermitian matrices
    Singer, Sanja
    Singer, Sasa
    Novakovic, Vedran
    Davidovic, Davor
    Bokulic, Kresimir
    Uscumlic, Aleksandar
    APPLIED MATHEMATICS AND COMPUTATION, 2012, 218 (09) : 5704 - 5725
  • [12] New topology of three-level reinjection current source converter in parallel
    Yang, Baofeng
    Wu, Zhenjun
    Liu, Yonghe
    Li, Yaohua
    Diangong Jishu Xuebao/Transactions of China Electrotechnical Society, 2009, 24 (05): : 67 - 72
  • [13] Simulation of a Micro strip Array using Parallel FDTD on a Supercomputer with 100K CPU Cores
    Jiang, Shugang
    Lin, Zhongchao
    Zhang, Yu
    Wei, Bing
    Cao, Chen
    Zhao, Hui
    PROCEEDINGS OF 2014 3RD ASIA-PACIFIC CONFERENCE ON ANTENNAS AND PROPAGATION (APCAP 2014), 2014, : 1024 - 1026
  • [14] Probabilistic teleportation of an unknown entangled state of two three-level particles using a partially entangled state of three three-level particles
    Dai, HY
    Zhang, M
    Li, CZ
    PHYSICS LETTERS A, 2004, 323 (5-6) : 360 - 364
  • [15] The planning of green infrastructure using a three-level approach
    Skujane, Daiga
    Spage, Aiga
    LANDSCAPE ARCHITECTURE AND ART, 2022, 21 (21): : 18 - 29
  • [16] An Improved Modulation Method for Parallel Three-Level Rectifiers With Circulating Current Mitigation
    Li, Yanfeng
    Zhang, Hongliang
    Jing, Xiao
    Zhao, Aiguang
    IEEE ACCESS, 2023, 11 : 28715 - 28723
  • [17] Three-level parallel high speed architecture for EBCOT in JPEG2000
    Li, YJ
    Bayoumi, M
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 5 - 8
  • [18] Parallel Three-Level Converter with Less Ripple Current and Balance Output Current
    Lin, Bor-Ren
    Liu, Wei-Po
    Dai, Jheng-Jie
    Wang, Chien-Ming
    PROCEEDINGS OF THE 2017 12TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2017, : 1074 - 1078
  • [19] A new three-level CMFD method based on the loosely coupled parallel strategy
    Liu, Zhouyu
    Zhou, Xinyu
    Cao, Liangzhi
    Wu, Hongchun
    ANNALS OF NUCLEAR ENERGY, 2020, 145
  • [20] Orthogonal three-level parallel flats designs for user-specified resolution
    Liao, CT
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1999, 28 (08) : 1945 - 1960