A Scalable MPI_Comm_split Algorithm for Exascale Computing

被引:0
|
作者
Sack, Paul [1 ]
Gropp, William [1 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
关键词
PERFORMANCE;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Existing algorithms for creating communicators in MN programs will not scale well to future exascale supercomputers containing millions of cores. In this work, we present a novel communicator-creation algorithm that does scale well into millions of processes using three techniques: replacing the sorting at the end of MPI_Comm_split with merging as the color and key table is built, sorting the color and key table in parallel, and using a distributed table to store the output communicator data rather than a replicated table. This reduces the time cost of MPI_Comm_split in the worst case we consider from 22 seconds to 0.37 second. Existing algorithms build a table with as many entries as processes, using vast amounts of memory. Our algorithm uses a small, fixed amount of memory per communicator after MPI_Comm_split has finished and uses a fraction of the memory used by the conventional algorithm for temporary storage during the execution of MPI_Comm_split.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 43 条
  • [31] A novel algorithm for computing the 2D split-vector-radix FFT
    Huang, HY
    Lee, YY
    Lo, PC
    SIGNAL PROCESSING, 2004, 84 (03) : 561 - 570
  • [32] A Lanczos algorithm for computing split quaternion partial singular value decomposition and its application
    Wang, Tao
    Li, Ying
    Zhang, Mingcui
    EUROPEAN PHYSICAL JOURNAL PLUS, 2025, 140 (02):
  • [33] Scalable indexing algorithm for multi-dimensional time-gap analysis with distributed computing
    Sutrisnowati, Riska Asriana
    Yahya, Bernardo Nugroho
    Bae, Hyerim
    Pulshashi, Iq Reviessay
    Adi, Taufik Nur
    4TH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE (ISICO 2017), 2017, 124 : 224 - 231
  • [34] A trustworthy, fault-tolerant and scalable self-configuration algorithm for Organic Computing systems
    Msadek, Nizar
    Kiefhaber, Rolf
    Ungerer, Theo
    JOURNAL OF SYSTEMS ARCHITECTURE, 2015, 61 (10) : 511 - 519
  • [35] Acceleration of the Dual-Field Domain Decomposition Algorithm Using MPI-CUDA on Large-Scale Computing Systems
    Meng, Huan-Ting
    Jin, Jian-Ming
    IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2014, 62 (09) : 4706 - 4715
  • [36] Quantum Inspired Adaptive Resource Management Algorithm for Scalable and Energy Efficient Fog Computing in Internet of Things (IoT)
    Khan, Sonia
    Younas, Naqash
    Alhussein, Musaed
    Khan, Wahib Jamal
    Anwar, Muhammad Shahid
    Aurangzeb, Khursheed
    CMES - Computer Modeling in Engineering and Sciences, 2025, 142 (03): : 2641 - 2660
  • [37] A customized two-stage parallel computing algorithm for solving the combined modal split and traffic assignment problem
    Zhang, Kai
    Zhang, Honggang
    Cheng, Qixiu
    Chen, Xinyuan
    Wang, Zewen
    Liu, Zhiyuan
    COMPUTERS & OPERATIONS RESEARCH, 2023, 154
  • [38] The split-up algorithm:: a fast symbolic method for computing p-values of distribution-free statistics
    van de Wiel, M
    COMPUTATIONAL STATISTICS, 2001, 16 (04) : 519 - 538
  • [39] The split-up algorithm: a fast symbolic method for computing p-values of distribution-free statistics
    Mark van de Wiel
    Computational Statistics, 2001, 16 : 519 - 538
  • [40] A Hybrid Many-Objective Optimization Algorithm for Job Scheduling in Cloud Computing Based on Merge-and-Split Theory
    Khaleel, Mustafa Ibrahim
    Safran, Mejdl
    Alfarhood, Sultan
    Zhu, Michelle
    MATHEMATICS, 2023, 11 (16)