PARALLEL SCALABILITY OF THREE-LEVEL FROSch PRECONDITIONERS TO 220000 CORES USING THE THETA SUPERCOMPUTER

被引：2

作者：

Heinlein, Alexander ^{[1
]}

Rheinbach, Oliver ^{[2
,3
]}

Roever, Friederike ^{[2
,3
]}

机构：

[1] Delft Univ Technol, Delft Inst Appl Math, Fac Elect Engn Math Comp Sci, Mekelweg 4, NL-2628 CD Delft, Netherlands

[2] Tech Univ Bergakad Freiberg, Fak Math & Informat, Zentrum effiziente Hochtemperatur Stoffwandlun Ze, D-09596 Freiberg, Germany

[3] Tech Univ Bergakad Freiberg, Fak Math & Informat, Univ rechenzentrum URZ, D-09596 Freiberg, Germany

来源：

SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2023年 / 45卷 / 03期

关键词：

domain decomposition; high performance computing; overlapping Schwarz; software; Trilinos; multilevel preconditioners; DOMAIN DECOMPOSITION; OVERLAPPING SCHWARZ; MULTILEVEL SCHWARZ;

D O I：

10.1137/21M1431205

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

The parallel performance of the three-level fast and robust overlapping Schwarz (FROSch) preconditioners is investigated for linear elasticity. The FROSch framework is part of the Trilinos software library and contains a parallel implementation of different preconditioners with energy minimizing coarse spaces of generalized Dryja-Smith-Widlund type. The three-level extension is constructed by a recursive application of the FROSch preconditioner to the coarse problem. In this paper, the additional steps in the implementation in order to apply the FROSch preconditioner recursively are described in detail. Furthermore, it is shown that no explicit geometric information is needed in the recursive application of the preconditioner. In particular, the rigid body modes, including the rotations, can be interpolated on the coarse level without additional geometric information. Parallel results for a three-dimensional linear elasticity problem obtained on the Theta supercomputer (Argonne Leadership Computing Facility, Argonne, IL) using up to 220 000 cores are discussed and compared to results obtained on the SuperMUC-NG supercomputer (Leibniz Supercomputing Centre, Garching, Germany). Notably, it can be observed that a hierarchical communication operation in FROSch related to the coarse operator starts to dominate the computing time on Theta, which has a dragonfly interconnect, for 100 000 message passing interface (MPI) ranks or more. The same operation, however, scales well and stays within the order of a second in all experiments performed on SuperMUC-NG, which uses a fat tree network. Using hybrid MPI/OpenMP parallelization, the onset of the MPI communication problem on Theta can be delayed. Further analysis of the performance of FROSch on large supercomputers with dragonfly interconnects will be necessary.

引用

页码：S173 / S198

页数：26

共 50 条

[11] Three-level parallel J-Jacobi algorithms for Hermitian matrices
Singer, Sanja
Singer, Sasa
Novakovic, Vedran
Davidovic, Davor
Bokulic, Kresimir
Uscumlic, Aleksandar
APPLIED MATHEMATICS AND COMPUTATION, 2012, 218 (09) : 5704 - 5725
[12] New topology of three-level reinjection current source converter in parallel
Yang, Baofeng
Wu, Zhenjun
Liu, Yonghe
Li, Yaohua
Diangong Jishu Xuebao/Transactions of China Electrotechnical Society, 2009, 24 (05): : 67 - 72
[13] Simulation of a Micro strip Array using Parallel FDTD on a Supercomputer with 100K CPU Cores
Jiang, Shugang
Lin, Zhongchao
Zhang, Yu
Wei, Bing
Cao, Chen
Zhao, Hui
PROCEEDINGS OF 2014 3RD ASIA-PACIFIC CONFERENCE ON ANTENNAS AND PROPAGATION (APCAP 2014), 2014, : 1024 - 1026
[14] Probabilistic teleportation of an unknown entangled state of two three-level particles using a partially entangled state of three three-level particles
Dai, HY
Zhang, M
Li, CZ
PHYSICS LETTERS A, 2004, 323 (5-6) : 360 - 364
[15] The planning of green infrastructure using a three-level approach
Skujane, Daiga
Spage, Aiga
LANDSCAPE ARCHITECTURE AND ART, 2022, 21 (21): : 18 - 29
[16] An Improved Modulation Method for Parallel Three-Level Rectifiers With Circulating Current Mitigation
Li, Yanfeng
Zhang, Hongliang
Jing, Xiao
Zhao, Aiguang
IEEE ACCESS, 2023, 11 : 28715 - 28723
[17] Three-level parallel high speed architecture for EBCOT in JPEG2000
Li, YJ
Bayoumi, M
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 5 - 8
[18] Parallel Three-Level Converter with Less Ripple Current and Balance Output Current
Lin, Bor-Ren
Liu, Wei-Po
Dai, Jheng-Jie
Wang, Chien-Ming
PROCEEDINGS OF THE 2017 12TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2017, : 1074 - 1078
[19] A new three-level CMFD method based on the loosely coupled parallel strategy
Liu, Zhouyu
Zhou, Xinyu
Cao, Liangzhi
Wu, Hongchun
ANNALS OF NUCLEAR ENERGY, 2020, 145
[20] Orthogonal three-level parallel flats designs for user-specified resolution
Liao, CT
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1999, 28 (08) : 1945 - 1960

← 1 2 3 4 5 →