PARALLEL SCALABILITY OF THREE-LEVEL FROSch PRECONDITIONERS TO 220000 CORES USING THE THETA SUPERCOMPUTER

被引:2
|
作者
Heinlein, Alexander [1 ]
Rheinbach, Oliver [2 ,3 ]
Roever, Friederike [2 ,3 ]
机构
[1] Delft Univ Technol, Delft Inst Appl Math, Fac Elect Engn Math Comp Sci, Mekelweg 4, NL-2628 CD Delft, Netherlands
[2] Tech Univ Bergakad Freiberg, Fak Math & Informat, Zentrum effiziente Hochtemperatur Stoffwandlun Ze, D-09596 Freiberg, Germany
[3] Tech Univ Bergakad Freiberg, Fak Math & Informat, Univ rechenzentrum URZ, D-09596 Freiberg, Germany
来源
SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2023年 / 45卷 / 03期
关键词
domain decomposition; high performance computing; overlapping Schwarz; software; Trilinos; multilevel preconditioners; DOMAIN DECOMPOSITION; OVERLAPPING SCHWARZ; MULTILEVEL SCHWARZ;
D O I
10.1137/21M1431205
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The parallel performance of the three-level fast and robust overlapping Schwarz (FROSch) preconditioners is investigated for linear elasticity. The FROSch framework is part of the Trilinos software library and contains a parallel implementation of different preconditioners with energy minimizing coarse spaces of generalized Dryja-Smith-Widlund type. The three-level extension is constructed by a recursive application of the FROSch preconditioner to the coarse problem. In this paper, the additional steps in the implementation in order to apply the FROSch preconditioner recursively are described in detail. Furthermore, it is shown that no explicit geometric information is needed in the recursive application of the preconditioner. In particular, the rigid body modes, including the rotations, can be interpolated on the coarse level without additional geometric information. Parallel results for a three-dimensional linear elasticity problem obtained on the Theta supercomputer (Argonne Leadership Computing Facility, Argonne, IL) using up to 220 000 cores are discussed and compared to results obtained on the SuperMUC-NG supercomputer (Leibniz Supercomputing Centre, Garching, Germany). Notably, it can be observed that a hierarchical communication operation in FROSch related to the coarse operator starts to dominate the computing time on Theta, which has a dragonfly interconnect, for 100 000 message passing interface (MPI) ranks or more. The same operation, however, scales well and stays within the order of a second in all experiments performed on SuperMUC-NG, which uses a fat tree network. Using hybrid MPI/OpenMP parallelization, the onset of the MPI communication problem on Theta can be delayed. Further analysis of the performance of FROSch on large supercomputers with dragonfly interconnects will be necessary.
引用
收藏
页码:S173 / S198
页数:26
相关论文
共 50 条
  • [1] PARALLEL SCALABILITY OF THREE-LEVEL FROSch PRECONDITIONERS TO 220000 CORES USING THE THETA SUPERCOMPUTER
    Delft University of Technology, Faculty of Electrical Engineering Mathematics & Computer Science, Delft Institute of Applied Mathematics, Mekelweg 4, Delft
    2628 CD, Netherlands
    不详
    09596, Germany
    Siam J. Sci. Comput., 3 (S173-S198):
  • [2] Parallel scalability study of hybrid preconditioners in three dimensions
    Giraud, L.
    Haidar, A.
    Watson, L. T.
    PARALLEL COMPUTING, 2008, 34 (6-8) : 363 - 379
  • [3] Performance and Scalability Analysis for Parallel Reservoir Simulations on Three Supercomputer Architectures
    Liu, Hui
    Zhang, Peng
    Wang, Kun
    Yang, Bo
    Chen, Zhangxin
    PROCEEDINGS OF XSEDE16: DIVERSITY, BIG DATA, AND SCIENCE AT SCALE, 2016,
  • [4] Three-level modeling of a speed-scaling supercomputer
    Rumyantsev, Alexander
    Basmadjian, Robert
    Astafiev, Sergey
    Golovin, Alexander
    ANNALS OF OPERATIONS RESEARCH, 2023, 331 (02) : 649 - 677
  • [5] Three-level modeling of a speed-scaling supercomputer
    Alexander Rumyantsev
    Robert Basmadjian
    Sergey Astafiev
    Alexander Golovin
    Annals of Operations Research, 2023, 331 : 649 - 677
  • [6] A Three-Level Parallel Algorithm For MrBayes 3.2
    Zhao, Mingjie
    Ren, Qiang
    Wang, Yilin
    Deng, Ruikang
    Ren, Mingming
    Wang, Gang
    Liu, Xiaoguang
    2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 1246 - 1250
  • [7] B-MIC: An Ultrafast Three-Level Parallel Sequence Aligner Using MIC
    Yingbo Cui
    Xiangke Liao
    Xiaoqian Zhu
    Bingqiang Wang
    Shaoliang Peng
    Interdisciplinary Sciences: Computational Life Sciences, 2016, 8 : 28 - 34
  • [8] B-MIC: An Ultrafast Three-Level Parallel Sequence Aligner Using MIC
    Cui, Yingbo
    Liao, Xiangke
    Zhu, Xiaoqian
    Wang, Bingqiang
    Peng, Shaoliang
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2016, 8 (01) : 28 - 34
  • [9] A Parallel Solver to the Three-Level VSC Modeling for HIL Application
    Liu, Chen
    Ma, Rui
    Bai, Hao
    Gecther, Franck
    Gao, Fei
    2018 IEEE TRANSPORTATION AND ELECTRIFICATION CONFERENCE AND EXPO (ITEC), 2018, : 108 - 113
  • [10] Suppression of Circulating Current in Parallel Operation of Three-Level Converters
    Son, Young-Kwang
    Chee, Seung-Jun
    Lee, Younggi
    Sul, Seung-Ki
    Lim, Changjin
    Huh, Sungjae
    Oh, Jaeyoon
    APEC 2016 31ST ANNUAL IEEE APPLIED POWER ELECTRONICS CONFERENCE AND EXPOSITION, 2016, : 2370 - 2375