A High Performance Broadcast Design with Hardware Multicast and GPUDirect RDMA for Streaming Applications on Infiniband Clusters

被引：0

作者：

Venkatesh, A. ^{[1
]}

Subramoni, H. ^{[1
]}

Hamidouche, K. ^{[1
]}

Panda, Dhabaleswar K. ^{[1
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

来源：

2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC) | 2014年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Several streaming applications in the field of high performance computing are obtaining significant speedups in execution time by leveraging the raw compute power offered by modern GPGPUs. This raw compute power, coupled with the high network throughput offered by high performance interconnects such as InfiniBand (IB) are allowing streaming applications to scale to rapidly. A frequently used operation that constitutes to the execution of multi-node streaming applications is the broadcast operation where data from a single source is transmitted to multiple sinks, typically from a live data site. Although high performance networks like IB offer novel features like hardware based multicast to speed up the performance of the broadcast operation, their benefits have been limited to host based applications due to the inability of IB Host Channel Adapters (HCAs) to directly access the memory of the GPGPUs. This poses a significant performance bottleneck to high performance streaming applications that rely heavily on broadcast operations from GPU memories. The recently introduced GPUDirect RDMA feature alleviates this bottleneck by enabling IB HCAs to perform data transfers directly to /from GPU memory (bypassing host memory). Thus, it presents an attractive alternative to designing high performance broadcast operations for GPGPU based high performance streaming applications. In this work, we propose a novel method for fully utilizing GPUDirect RDMA and hardware multicast features in tandem to design a high performance broadcast operation for streaming applications. The experiments conducted with the proposed design show up 60% decrease in latency and 3X-4X improvement in a throughput benchmark compared to the naive scheme on 64 GPU nodes.

引用

页数：10

共 50 条

[1] Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast
Chu, Ching-Hsiang
Lu, Xiaoyi
Awan, Ammar A.
Subramoni, Hari
Elton, Bracy
Panda, Dhabaleswar K.
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (03) : 575 - 588
[2] High performance RDMA based all-to-all broadcast for InfiniBand clusters
Sur, S
Bondhugula, UKR
Mamidala, A
Jin, HW
Panda, DK
[J]. HIGH PERFORMANCE COMPUTING - HIPC 2005, PROCEEDINGS, 2005, 3769 : 148 - 157
[3] Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters
Hamidouche, Khaled
Venkatesh, Akshay
Awan, Ammar Ahmad
Subramoni, Hari
Chu, Ching-Hsiang
Panda, Dhabaleswar K.
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 78 - 87
[4] High-performance design of hbase with RDMA over InfiniBand
Huang, Jian
Ouyang, Xiangyong
Jose, Jithin
Wasi-Ur-Rahman, Md.
Wang, Hao
Luo, Miao
Subramoni, Hari
Murthy, Chet
Panda, Dhabaleswar K.
[J]. Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012, 2012, : 774 - 785
[5] High-Performance Design of HBase with RDMA over InfiniBand
Huang, Jian
Ouyang, Xiangyong
Jose, Jithin
Wasi-ur-Rahman, Md
Wang, Hao
Luo, Miao
Subramoni, Hari
Murthy, Chet
Panda, Dhabaleswar K.
[J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 774 - 785
[6] Efficient Inter-node MPI Communication using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs
Potluri, Sreeram
Hamidouche, Khaled
Venkatesh, Akshay
Bureddy, Devendar
Panda, Dhabaleswar K.
[J]. 2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 80 - 89
[7] Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters
Chu, C. -H.
Hamidouche, K.
Subramoni, H.
Venkatesh, A.
Elton, B.
Panda, D. K.
[J]. PROCEEDINGS OF 28TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, (SBAC-PAD 2016), 2016, : 59 - 66
[8] High-Performance Design of Hadoop RPC with RDMA over InfiniBand
Lu, Xiaoyi
Islam, Nusrat S.
Wasi-ur-Rahman, Md
Jose, Jithin
Subramoni, Hari
Wang, Hao
Panda, Dhabaleswar K.
[J]. 2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 641 - 650
[9] High Performance RDMA-based Design of HDFS over InfiniBand
Islam, N. S.
Rahman, M. W.
Jose, J.
Rajachandrasekar, R.
Wang, H.
Subramoni, H.
Murthy, C.
Panda, D. K.
[J]. 2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,
[10] Fast and scalable barrier using RDMA and multicast mechanisms for infiniband-based clusters
Kini, Sushmitha P.
Liu, Jiuxing
Wu, Jiesheng
Wyckoff, Pete
Panda, Dhabaleswar K.
[J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2003, 2840 : 369 - 378

← 1 2 3 4 5 →