Faster and Scalable MPI Applications Launching

被引:0
|
作者
Dong, Yong [1 ]
Dai, Yiqin [1 ]
Xie, Min [1 ]
Lu, Kai [1 ]
Wang, Ruibo [1 ]
Chen, Juan [1 ]
Shao, Mingtian [1 ]
Wang, Zheng [2 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China
[2] Univ Leeds, Sch Comp, Leeds LS2 9JT, England
关键词
Peer-to-peer computing; Hardware; Libraries; Multiprocessor interconnection; Production; Optimization; Full stack; Message passing interface (MPI); high performance computing (HPC); MPI application optimizaiton;
D O I
10.1109/TPDS.2022.3218077
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Distributed parallel MPI applications are the dominant workload in many high-performance computing systems. While optimizing MPI application execution is a well-studied field, little work has considered optimizing the initial MPI application launching phase, which incurs extensive cross-machine communications and synchronization. The overhead of MPI application launching can be expensive, accounting for more than million core hours per 10K nodes annually on the production Tianhe-2A supercomputer, which will increase as the number of parallel machines used grows. Therefore, it is critical to optimize the MPI application launching process. This paper presents a novel approach to optimizing the MPI application launch. Our approach adopts a location-aware address generation rule to eliminate the need for address exchange and a topology-aware global communication scheme to optimize cross-machine synchronization. We then design a new application launch procedure to support the proposed optimizations to further reduce the pressure of the shared I/O system. Our techniques have been deployed to production in the Tianhe-2A supercomputer and the Next Generation Tianhe Supercomputer. Experimental results show that our approach scales well and outperforms alternative schemes, reducing the MPI application launching time by over 29% with 320K MPI processes.
引用
收藏
页码:264 / 279
页数:16
相关论文
共 50 条
  • [1] Scalable Communication Endpoints for MPI plus Threads Applications
    Zambre, Rohit
    Chandramowlishwaran, Aparna
    Balaji, Pavan
    [J]. 2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 803 - 812
  • [2] SPBC: Leveraging the Characteristics of MPI HPC Applications for Scalable Checkpointing
    Ropars, Thomas
    Martsinkevich, Tatiana V.
    Guermouche, Amina
    Schiper, Andre
    Cappello, Franck
    [J]. 2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
  • [3] Toward More Scalable Off-Line Simulations of MPI Applications
    Casanova, Henri
    Gupta, Anshul
    Suter, Frederic
    [J]. PARALLEL PROCESSING LETTERS, 2015, 25 (03)
  • [4] Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications
    Schmitt, Felix
    Dietrich, Robert
    Juckeland, Guido
    [J]. PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 909 - 916
  • [5] A scalable asynchronous replication-based strategy for fault tolerant MPI applications
    Walters, John Paul
    Chaudhary, Vipin
    [J]. HIGH PERFORMANCE COMPUTING - HIPC 2007, PROCEEDINGS, 2007, 4873 : 257 - 268
  • [6] Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives
    Afzal, Ayesha
    Hager, Georg
    Markidis, Stefano
    Wellein, Gerhard
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 148 : 472 - 487
  • [7] An Adaptive, Scalable, and Portable Technique for Speeding Up MPI-Based Applications
    Filgueira, Rosa
    Atkinson, Malcolm
    Nunez, Alberto
    Fernandez, Javier
    [J]. EURO-PAR 2012 PARALLEL PROCESSING, 2012, 7484 : 729 - 740
  • [8] EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications
    Chakraborty, Sourav
    Laguna, Ignacio
    Emani, Murali
    Mohror, Kathryn
    Panda, Dhabaleswar K.
    Schulz, Martin
    Subramoni, Hari
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (03):
  • [9] SMPI: Scalable Serverless MPI Computing
    Yuan, Yuxin
    Shi, Xiao
    Lei, Zhengyu
    Wang, Xiaohong
    Zhao, Xiaofang
    [J]. 2022 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE, IPCCC, 2022,
  • [10] ScELA: Scalable and Extensible Launching Architecture for Clusters
    Sridhar, Jaidev K.
    Koop, Matthew J.
    Perkins, Jonathan L.
    Panda, Dhabaleswar K.
    [J]. High Performance Computing - HiPC 2008, Proceedings, 2008, 5374 : 323 - 335