High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters

被引：0

作者：

Li, Mingzhe ^{[1
]}

Hamidouche, Khaled ^{[1
]}

Lu, Xiaoyi ^{[1
]}

Lin, Jian ^{[1
]}

Panda, Dhabaleswar K. ^{[1
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

来源：

EURO-PAR 2015: PARALLEL PROCESSING | 2015年 / 9233卷

关键词：

D O I：

10.1007/978-3-662-48096-0_48

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Intel Many Integrated Core (MIC) architectures have been playing a key role in modern supercomputing systems due to the features of high performance and low power consumption. This makes them become an attractive choice to accelerate HPC applications. MPI-3 RMA is an important part of the MPI-3 standard. It provides one-sided semantics that reduce the synchronization overhead and allow overlapping of communication with computation. This makes the RMA model the first target for developing scalable applications with irregular communication patterns. However, an efficient runtime support for MPI-3 RMA with simultaneous use of both processors and co-processors is still not well exploited. In this paper, we propose high-performance and scalable runtime-level designs for MPI-3 RMA involving both the host and Xeon Phi processors. We incorporate our designs into the popular MVAPICH2 MPI library. To the best of our knowledge, this is the first research work that proposes high-performance runtime designs for MPI-3 RMA on Intel Xeon Phi clusters. Experimental evaluations indicate a reduction of 5X in the uni-directional MPI Put and MPI Get latency for 4 MB messages between two Xeon Phis, compared to an out-of-the-box version of MVAPICH2. Application evaluations in the symmetric mode show performance improvements of 25% at the scale of 1,024 processes.

引用

页码：625 / 637

页数：13

共 50 条

[1] Scalable Graph500 Design with MPI-3 RMA
Li, Mingzhe
Lu, Xiaoyi
Potluri, Sreeram
Hamidouche, Khaled
Jose, Jithin
Tomko, Karen
Panda, Dhabaleswar K.
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2014, : 230 - 238
[2] Using MPI-3 RMA for Active Messages
Schuchart, Joseph
Bouteiller, Aurelien
Bosilca, George
[J]. PROCEEDINGS OF 2019 IEEE/ACM WORKSHOP ON EXASCALE MPI (EXAMPI 2019), 2019, : 47 - 56
[3] MVAPICH2-MIC: A High Performance MPI Library for Xeon Phi Clusters with InfiniBand
Potluri, Sreeram
Hamidouche, Khaled
Bureddy, Devendar
Panda, Dhabaleswar K.
[J]. 2013 EXTREME SCALING WORKSHOP (XSW 2013), 2014, : 25 - 32
[4] Efficient implementation of MPI-3 RMA over openFabrics interfaces
Fujita, Hajime
Cao, Chongxiao
Sur, Sayantan
Archer, Charles
Paulson, Erik
Garzaran, Maria
[J]. PARALLEL COMPUTING, 2019, 87 : 1 - 10
[5] Recent Experiences in Using MPI-3 RMA in the DASH PGAS Runtime
Schuchart, Joseph
Kowalewski, Roger
Fuerlinger, Karl
[J]. HPC ASIA'18: PROCEEDINGS OF WORKSHOPS OF HPC ASIA, 2018, : 21 - 30
[6] RMACXX: An Efficient High-Level C plus plus Interface over MPI-3 RMA
Ghosh, Sayan
Guo, Yanfei
Balaji, Pavan
Gebremedhin, Assefaw H.
[J]. 21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 143 - 155
[7] High Performance OpenSHMEM for Xeon Phi Clusters: Extensions, Runtime Designs and Application Co-design
Jose, Jithin
Hamidouche, Khaled
Lu, Xiaoyi
Potluri, Sreeram
Zhang, Jie
Tomko, Karen
Panda, Dhabaleswar K.
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2014, : 10 - 18
[8] High-Performance Incremental SVM Learning on Intel® Xeon Phi™ Processors
Sengupta, Dipanjan
Wang, Yida
Sundaram, Narayanan
Willke, Theodore L.
[J]. HIGH PERFORMANCE COMPUTING (ISC HIGH PERFORMANCE 2017), 2017, 10266 : 120 - 138
[9] Performance Optimization of OpenFOAM* on Clusters of Intel® Xeon Phi™ Processors
Ojha, Ravi
Pawar, Prasad
Gupta, Sonia
Klemm, Michael
Nambiar, Manoj
[J]. 2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING WORKSHOPS (HIPCW), 2017, : 51 - 59
[10] A Novel Functional Partitioning Approach to Design High-Performance MPI-3 Non-Blocking Alltoallv Collective on Multi-core Systems
Kandalla, K.
Subramoni, H.
Tomko, K.
Pekurovsky, D.
Panda, D. K.
[J]. 2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 611 - 620

← 1 2 3 4 5 →