ADAPT: An Event-Based Adaptive Collective Communication Framework

被引:15
|
作者
Luo, Xi [1 ]
Wu, Wei [2 ]
Bosilca, George [1 ]
Patinyasakdikul, Thananon [1 ]
Wang, Linnan [3 ]
Dongarra, Jack [1 ,4 ]
机构
[1] Univ Tennessee, Knoxville, TN 37996 USA
[2] Los Alamos Natl Lab, Los Alamos, NM USA
[3] Brown Univ, Providence, RI 02912 USA
[4] Oak Ridge Natl Lab, Oak Ridge, TN USA
关键词
MPI; event-driven; system noise; collectives operations; GPU; heterogeneous system; PERFORMANCE;
D O I
10.1145/3208040.3208054
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The increase in scale and heterogeneity of high-performance computing (HPC) systems predispose the performance of Message Passing Interface (MPI) collective communications to be susceptible to noise, and to adapt to a complex mix of hardware capabilities. The designs of state of the art MPI collectives heavily rely on synchronizations; these designs magnify noise across the participating processes, resulting in significant performance slowdown. Therefore, such design philosophy must be reconsidered to efficiently and robustly run on the large-scale heterogeneous platforms. In this paper, we present ADAPT, a new collective communication framework in Open MPI, using event-driven techniques to morph collective algorithms to heterogeneous environments. The core concept of ADAPT is to relax synchronizations, while maintaining the minimal data dependencies of MPI collectives. To fully exploit the different bandwidths of data movement lanes in heterogeneous systems, we extend the ADAPT collective framework with a topology-aware communication tree. This removes the boundaries of different hardware topologies while maximizing the speed of data movements. We evaluate our framework with two popular collective operations: broadcast and reduce on both CPU and GPU clusters. Our results demonstrate drastic performance improvements and a strong resistance against noise compared to other state of the art MPI libraries. In particular, we demonstrate at least 1.3x and 1.5x speedup for CPU data and 2x and 10x speedup for GPU data using ADAPT event-based broadcast and reduce operations.
引用
收藏
页码:118 / 130
页数:13
相关论文
共 50 条
  • [31] An Artificial Neural SLAM Framework for Event-Based Vision
    Gelen, Aykut G.
    Atasoy, Ayten
    [J]. IEEE ACCESS, 2023, 11 : 58436 - 58450
  • [32] An Event-based Formal Framework for Dynamic Software Update
    An, Shengwei
    Ma, Xiaoxing
    Cao, Chun
    Yu, Ping
    Xu, Chang
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE SECURITY AND RELIABILITY (QRS 2015), 2015, : 173 - 182
  • [33] Directly and Indirectly Synchronous Communication Mechanisms for Client-Server Systems Using Event-Based Asynchronous Communication Framework
    Lim, Mingyu
    [J]. IEEE ACCESS, 2019, 7 : 81969 - 81982
  • [34] Event-based Multitrack Alignment using a Probabilistic Framework
    Robertson, A.
    Plumbley, M. D.
    [J]. JOURNAL OF NEW MUSIC RESEARCH, 2015, 44 (02) : 71 - 82
  • [35] Conceptual framework for an event-based plant alarm system
    Ahmed, Salim
    Dalpatadu, Pradeep
    Khan, Faisal
    [J]. 2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 491 - 496
  • [36] Event-Based Adaptive Fault Tolerant Control and Collision Avoidance of Wheel Mobile Robots With Communication Limits
    Qian, Moshu
    Sun, Chenglin
    Jiang, Bin
    Wang, Ronghao
    Shi, Jiantao
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024, 71 (11) : 1 - 10
  • [37] Overview of Event-based Collective Knowledge Management in Multimedia Digital Ecosystems
    Abebe, Minale A.
    Tekli, Joe
    Getahun, Fekade
    Chbeir, Richard
    Tekli, Gilbert
    [J]. 2017 13TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY AND INTERNET-BASED SYSTEMS (SITIS), 2017, : 40 - 49
  • [38] Quaternion-Based Attitude Synchronization With an Event-Based Communication Strategy
    Zhang, Dandan
    Tang, Yang
    Jin, Xin
    Kurths, Jurgen
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (03) : 1333 - 1346
  • [39] Event-Based Estimation With Information-Based Triggering and Adaptive Update
    Mohammadi, Arash
    Plataniotis, Konstantinos N.
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (18) : 4924 - 4939
  • [40] Synchronization of multi-agent systems with event-based communication
    Demir, Ozan
    Lunze, Jan
    [J]. AT-AUTOMATISIERUNGSTECHNIK, 2014, 62 (08) : 535 - 546