MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives

被引:0
|
作者
Graham, Richard L. [1 ]
Shipman, Galen [1 ]
机构
[1] Oak Ridge Natl Lab, Oak Ridge, TN USA
关键词
Collectives; Shared-Memory; MPI_Bcast; MPI_Reduce; MPI_Allreduce;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With local core counts on the rise, taking advantage of shared-memory to optimize collective operations can improve performance. We study several on-host shared memory optimized algorithms for MPI_Bcast, MPI_Reduce, and MPI_Allreduce, using tree-based, and reduce-scatter algorithms. For small data operations with relatively large synchronization costs fan-in/fan-out algorithms generally perform best. For large messages data manipulation constitute the largest cost and reduce-scatter algorithms are best for reductions. These optimization improve performance by up to a factor of three. Memory and cache sharing effect require deliberate process layout and careful radix selection for tree-based methods.
引用
收藏
页码:130 / 140
页数:11
相关论文
共 50 条
  • [1] MPI Collectives for Multi-core Clusters: Optimized Performance of the Hybrid MPI plus MPI Parallel Codes
    Zhou, Huan
    Gracia, Jose
    Schneider, Ralf
    [J]. PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP 2019), 2019,
  • [2] Optimizing Multi-Core MPI Collectives with SMARTMAP
    Brightwell, Ron
    Pedretti, Kevin
    [J]. 2009 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW 2009), 2009, : 370 - 377
  • [3] Multi-core Aware Optimization for MPI Collectives
    Tu, Bibo
    Zou, Ming
    Zhan, Hanfeng
    Zhao, Xiaofang
    Fan, Hanping
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2008, : 322 - 325
  • [4] Redesigning MPI shared memory communication for large multi-core architecture
    Luo, Miao
    Wang, Hao
    Vienne, Jerome
    Panda, Dhabaleswar K.
    [J]. COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2013, 28 (2-3): : 137 - 146
  • [5] EXPLOITING DIRECT ACCESS SHARED MEMORY FOR MPI ON MULTI-CORE PROCESSORS
    Brightwell, Ron
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2010, 24 (01): : 69 - 77
  • [6] Decoupled MapReduce for Shared-Memory Multi-Core Architectures
    Iliakis, Konstantinos
    Xydis, Sotirios
    Soudris, Dimitrios
    [J]. IEEE COMPUTER ARCHITECTURE LETTERS, 2018, 17 (02) : 143 - 146
  • [7] Parallel Shared-Memory Workloads Performance on Asymmetric Multi-core Architectures
    Madruga, Felipe L.
    Freitas, Henrique C.
    Navaux, Philippe O. A.
    [J]. PROCEEDINGS OF THE 18TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2010, : 163 - 169
  • [8] Addressing Resource Contention and Timing Predictability for Multi-Core Architectures with Shared Memory Interconnects
    Wang, Haitong
    Audsley, Neil C.
    Chang, Wanli
    [J]. 2020 IEEE REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2020), 2020, : 70 - 81
  • [9] Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures
    Ilashmi, Jahanzeb Maqbool
    Chakraborty, Sourav
    Bayatpour, Mohammadreza
    Subramoni, Hari
    Panda, Dhabaleswar K.
    [J]. 2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 410 - 419
  • [10] Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters Using Shared Memory Backed Windows
    Potluri, Sreeram
    Wang, Hao
    Dhanraj, Vijay
    Sur, Sayantan
    Panda, Dhabaleswar K.
    [J]. RECENT ADVANCES IN THE MESSAGE PASSING INTERFACE, 2011, 6960 : 99 - 109