A High Performance Broadcast Design with Hardware Multicast and GPUDirect RDMA for Streaming Applications on Infiniband Clusters

被引:0
|
作者
Venkatesh, A. [1 ]
Subramoni, H. [1 ]
Hamidouche, K. [1 ]
Panda, Dhabaleswar K. [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Several streaming applications in the field of high performance computing are obtaining significant speedups in execution time by leveraging the raw compute power offered by modern GPGPUs. This raw compute power, coupled with the high network throughput offered by high performance interconnects such as InfiniBand (IB) are allowing streaming applications to scale to rapidly. A frequently used operation that constitutes to the execution of multi-node streaming applications is the broadcast operation where data from a single source is transmitted to multiple sinks, typically from a live data site. Although high performance networks like IB offer novel features like hardware based multicast to speed up the performance of the broadcast operation, their benefits have been limited to host based applications due to the inability of IB Host Channel Adapters (HCAs) to directly access the memory of the GPGPUs. This poses a significant performance bottleneck to high performance streaming applications that rely heavily on broadcast operations from GPU memories. The recently introduced GPUDirect RDMA feature alleviates this bottleneck by enabling IB HCAs to perform data transfers directly to /from GPU memory (bypassing host memory). Thus, it presents an attractive alternative to designing high performance broadcast operations for GPGPU based high performance streaming applications. In this work, we propose a novel method for fully utilizing GPUDirect RDMA and hardware multicast features in tandem to design a high performance broadcast operation for streaming applications. The experiments conducted with the proposed design show up 60% decrease in latency and 3X-4X improvement in a throughput benchmark compared to the naive scheme on 64 GPU nodes.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast
    Chu, Ching-Hsiang
    Lu, Xiaoyi
    Awan, Ammar A.
    Subramoni, Hari
    Elton, Bracy
    Panda, Dhabaleswar K.
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (03) : 575 - 588
  • [2] High performance RDMA based all-to-all broadcast for InfiniBand clusters
    Sur, S
    Bondhugula, UKR
    Mamidala, A
    Jin, HW
    Panda, DK
    [J]. HIGH PERFORMANCE COMPUTING - HIPC 2005, PROCEEDINGS, 2005, 3769 : 148 - 157
  • [3] Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters
    Hamidouche, Khaled
    Venkatesh, Akshay
    Awan, Ammar Ahmad
    Subramoni, Hari
    Chu, Ching-Hsiang
    Panda, Dhabaleswar K.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 78 - 87
  • [4] High-performance design of hbase with RDMA over InfiniBand
    Huang, Jian
    Ouyang, Xiangyong
    Jose, Jithin
    Wasi-Ur-Rahman, Md.
    Wang, Hao
    Luo, Miao
    Subramoni, Hari
    Murthy, Chet
    Panda, Dhabaleswar K.
    [J]. Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012, 2012, : 774 - 785
  • [5] High-Performance Design of HBase with RDMA over InfiniBand
    Huang, Jian
    Ouyang, Xiangyong
    Jose, Jithin
    Wasi-ur-Rahman, Md
    Wang, Hao
    Luo, Miao
    Subramoni, Hari
    Murthy, Chet
    Panda, Dhabaleswar K.
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 774 - 785
  • [6] Efficient Inter-node MPI Communication using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs
    Potluri, Sreeram
    Hamidouche, Khaled
    Venkatesh, Akshay
    Bureddy, Devendar
    Panda, Dhabaleswar K.
    [J]. 2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 80 - 89
  • [7] Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters
    Chu, C. -H.
    Hamidouche, K.
    Subramoni, H.
    Venkatesh, A.
    Elton, B.
    Panda, D. K.
    [J]. PROCEEDINGS OF 28TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, (SBAC-PAD 2016), 2016, : 59 - 66
  • [8] High-Performance Design of Hadoop RPC with RDMA over InfiniBand
    Lu, Xiaoyi
    Islam, Nusrat S.
    Wasi-ur-Rahman, Md
    Jose, Jithin
    Subramoni, Hari
    Wang, Hao
    Panda, Dhabaleswar K.
    [J]. 2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 641 - 650
  • [9] High Performance RDMA-based Design of HDFS over InfiniBand
    Islam, N. S.
    Rahman, M. W.
    Jose, J.
    Rajachandrasekar, R.
    Wang, H.
    Subramoni, H.
    Murthy, C.
    Panda, D. K.
    [J]. 2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,
  • [10] Fast and scalable barrier using RDMA and multicast mechanisms for infiniband-based clusters
    Kini, Sushmitha P.
    Liu, Jiuxing
    Wu, Jiesheng
    Wyckoff, Pete
    Panda, Dhabaleswar K.
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2003, 2840 : 369 - 378