ADAPT: An Event-Based Adaptive Collective Communication Framework

被引:15
|
作者
Luo, Xi [1 ]
Wu, Wei [2 ]
Bosilca, George [1 ]
Patinyasakdikul, Thananon [1 ]
Wang, Linnan [3 ]
Dongarra, Jack [1 ,4 ]
机构
[1] Univ Tennessee, Knoxville, TN 37996 USA
[2] Los Alamos Natl Lab, Los Alamos, NM USA
[3] Brown Univ, Providence, RI 02912 USA
[4] Oak Ridge Natl Lab, Oak Ridge, TN USA
关键词
MPI; event-driven; system noise; collectives operations; GPU; heterogeneous system; PERFORMANCE;
D O I
10.1145/3208040.3208054
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The increase in scale and heterogeneity of high-performance computing (HPC) systems predispose the performance of Message Passing Interface (MPI) collective communications to be susceptible to noise, and to adapt to a complex mix of hardware capabilities. The designs of state of the art MPI collectives heavily rely on synchronizations; these designs magnify noise across the participating processes, resulting in significant performance slowdown. Therefore, such design philosophy must be reconsidered to efficiently and robustly run on the large-scale heterogeneous platforms. In this paper, we present ADAPT, a new collective communication framework in Open MPI, using event-driven techniques to morph collective algorithms to heterogeneous environments. The core concept of ADAPT is to relax synchronizations, while maintaining the minimal data dependencies of MPI collectives. To fully exploit the different bandwidths of data movement lanes in heterogeneous systems, we extend the ADAPT collective framework with a topology-aware communication tree. This removes the boundaries of different hardware topologies while maximizing the speed of data movements. We evaluate our framework with two popular collective operations: broadcast and reduce on both CPU and GPU clusters. Our results demonstrate drastic performance improvements and a strong resistance against noise compared to other state of the art MPI libraries. In particular, we demonstrate at least 1.3x and 1.5x speedup for CPU data and 2x and 10x speedup for GPU data using ADAPT event-based broadcast and reduce operations.
引用
收藏
页码:118 / 130
页数:13
相关论文
共 50 条
  • [1] Eva: an event-based framework for developing specialised communication protocols
    Brasileiro, F
    Greve, F
    Hurfin, M
    Le Narzul, JP
    Tronel, F
    [J]. IEEE INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS, PROCEEDINGS, 2001, : 108 - 119
  • [2] Event-Based Communication for IoT Networking
    Kolios, Panayiotis
    Panayiotou, Christos
    Ellinas, Georgios
    Polycarpou, Marios
    [J]. 2015 IEEE 2ND WORLD FORUM ON INTERNET OF THINGS (WF-IOT), 2015, : 333 - 338
  • [3] An event-based framework for model integration
    Jia, Xiaoping
    Steele, Adam
    Qin, Lizhang
    Liu, Hongming
    Jones, Chris
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY (EIT 2005), 2005, : 446 - 451
  • [4] Global Status Event-Based Iterative Co-Simulation Framework for Power and Communication
    Tong, Heqin
    Luo, Jianbo
    Ni, Ming
    Li, Yuecen
    Xue, Yusheng
    Wei, Yanhong
    [J]. 2017 IEEE POWER & ENERGY SOCIETY GENERAL MEETING, 2017,
  • [5] Resilient Consensus Through Event-Based Communication
    Wang, Yuan
    Ishii, Hideaki
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2020, 7 (01): : 471 - 482
  • [6] Event-based network consensus with communication delays
    Lulu Li
    Daniel W. C. Ho
    Jianquan Lu
    [J]. Nonlinear Dynamics, 2017, 87 : 1847 - 1858
  • [7] Dynamic event-based optical identification and communication
    von Arnim, Axel
    Lecomte, Jules
    Borras, Naima Elosegui
    Wozniak, Stanislaw
    Pantazi, Angeliki
    [J]. FRONTIERS IN NEUROROBOTICS, 2024, 18
  • [8] Event-based network consensus with communication delays
    Li, Lulu
    Ho, Daniel W. C.
    Lu, Jianquan
    [J]. NONLINEAR DYNAMICS, 2017, 87 (03) : 1847 - 1858
  • [9] Energy and Bandwidth Efficiency of Event-Based Communication
    Willuweit, Christopher
    Bockelmann, Carsten
    Dekorsy, Armin
    [J]. 2023 IEEE 97TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-SPRING, 2023,
  • [10] An Eulerian Framework for Event-Based Pattern Verification
    Nachamkin, Jason E.
    Jin, Yi
    [J]. WEATHER AND FORECASTING, 2017, 32 (06) : 2027 - 2043