Faster and Scalable MPI Applications Launching

被引：0

作者：

Dong, Yong ^{[1
]}

Dai, Yiqin ^{[1
]}

Xie, Min ^{[1
]}

Lu, Kai ^{[1
]}

Wang, Ruibo ^{[1
]}

Chen, Juan ^{[1
]}

Shao, Mingtian ^{[1
]}

Wang, Zheng ^{[2
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China

[2] Univ Leeds, Sch Comp, Leeds LS2 9JT, England

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2024年 / 35卷 / 02期

关键词：

Peer-to-peer computing; Hardware; Libraries; Multiprocessor interconnection; Production; Optimization; Full stack; Message passing interface (MPI); high performance computing (HPC); MPI application optimizaiton;

D O I：

10.1109/TPDS.2022.3218077

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Distributed parallel MPI applications are the dominant workload in many high-performance computing systems. While optimizing MPI application execution is a well-studied field, little work has considered optimizing the initial MPI application launching phase, which incurs extensive cross-machine communications and synchronization. The overhead of MPI application launching can be expensive, accounting for more than million core hours per 10K nodes annually on the production Tianhe-2A supercomputer, which will increase as the number of parallel machines used grows. Therefore, it is critical to optimize the MPI application launching process. This paper presents a novel approach to optimizing the MPI application launch. Our approach adopts a location-aware address generation rule to eliminate the need for address exchange and a topology-aware global communication scheme to optimize cross-machine synchronization. We then design a new application launch procedure to support the proposed optimizations to further reduce the pressure of the shared I/O system. Our techniques have been deployed to production in the Tianhe-2A supercomputer and the Next Generation Tianhe Supercomputer. Experimental results show that our approach scales well and outperforms alternative schemes, reducing the MPI application launching time by over 29% with 320K MPI processes.

引用

页码：264 / 279

页数：16

共 50 条

[1] Scalable Communication Endpoints for MPI plus Threads Applications
Zambre, Rohit
Chandramowlishwaran, Aparna
Balaji, Pavan
[J]. 2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 803 - 812
[2] SPBC: Leveraging the Characteristics of MPI HPC Applications for Scalable Checkpointing
Ropars, Thomas
Martsinkevich, Tatiana V.
Guermouche, Amina
Schiper, Andre
Cappello, Franck
[J]. 2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
[3] Toward More Scalable Off-Line Simulations of MPI Applications
Casanova, Henri
Gupta, Anshul
Suter, Frederic
[J]. PARALLEL PROCESSING LETTERS, 2015, 25 (03)
[4] Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications
Schmitt, Felix
Dietrich, Robert
Juckeland, Guido
[J]. PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 909 - 916
[5] A scalable asynchronous replication-based strategy for fault tolerant MPI applications
Walters, John Paul
Chaudhary, Vipin
[J]. HIGH PERFORMANCE COMPUTING - HIPC 2007, PROCEEDINGS, 2007, 4873 : 257 - 268
[6] Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives
Afzal, Ayesha
Hager, Georg
Markidis, Stefano
Wellein, Gerhard
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 148 : 472 - 487
[7] An Adaptive, Scalable, and Portable Technique for Speeding Up MPI-Based Applications
Filgueira, Rosa
Atkinson, Malcolm
Nunez, Alberto
Fernandez, Javier
[J]. EURO-PAR 2012 PARALLEL PROCESSING, 2012, 7484 : 729 - 740
[8] EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications
Chakraborty, Sourav
Laguna, Ignacio
Emani, Murali
Mohror, Kathryn
Panda, Dhabaleswar K.
Schulz, Martin
Subramoni, Hari
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (03):
[9] SMPI: Scalable Serverless MPI Computing
Yuan, Yuxin
Shi, Xiao
Lei, Zhengyu
Wang, Xiaohong
Zhao, Xiaofang
[J]. 2022 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE, IPCCC, 2022,
[10] ScELA: Scalable and Extensible Launching Architecture for Clusters
Sridhar, Jaidev K.
Koop, Matthew J.
Perkins, Jonathan L.
Panda, Dhabaleswar K.
[J]. High Performance Computing - HiPC 2008, Proceedings, 2008, 5374 : 323 - 335

← 1 2 3 4 5 →