Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast

被引:9
|
作者
Chu, Ching-Hsiang [1 ]
Lu, Xiaoyi [1 ]
Awan, Ammar A. [1 ]
Subramoni, Hari [1 ]
Elton, Bracy [2 ]
Panda, Dhabaleswar K. [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Engility Corp, Dayton, OH 45433 USA
关键词
Broadcast; deep learning; hardware multicast; GPU; GPUDirect RDMA; heterogeneous broadcast; streaming;
D O I
10.1109/TPDS.2018.2867222
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Broadcast is a widely used operation in many streaming and deep learning applications to disseminate large amounts of data on emerging heterogeneous High-Performance Computing (HPC) systems. However, traditional broadcast schemes do not fully utilize hardware features for Graphics Processing Unit (GPU)-based applications. In this paper, a model-oriented analysis is presented to identify performance bottlenecks of existing broadcast schemes on GPU clusters. Next, streaming-based broadcast schemes are proposed to exploit InfiniBand hardware multicast (IB-MCAST) and NVIDIA GPUDirect technology for efficient message transmission. The proposed designs are evaluated in the context of using Message Passing Interface (MPI) based benchmarks and applications. The experimental results indicate improved scalability and up to 82 percent reduction of latency compared to the state-of-the-art solutions in the benchmark-level evaluation. Furthermore, compared to the state-of-the-art, the proposed design yields stable higher throughput for a synthetic streaming workload, and 1.3x faster training time for a deep learning framework.
引用
收藏
页码:575 / 588
页数:14
相关论文
共 50 条
  • [1] A High Performance Broadcast Design with Hardware Multicast and GPUDirect RDMA for Streaming Applications on Infiniband Clusters
    Venkatesh, A.
    Subramoni, H.
    Hamidouche, K.
    Panda, Dhabaleswar K.
    [J]. 2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2014,
  • [2] Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters
    Hamidouche, Khaled
    Venkatesh, Akshay
    Awan, Ammar Ahmad
    Subramoni, Hari
    Chu, Ching-Hsiang
    Panda, Dhabaleswar K.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 78 - 87
  • [3] Efficient Inter-node MPI Communication using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs
    Potluri, Sreeram
    Hamidouche, Khaled
    Venkatesh, Akshay
    Bureddy, Devendar
    Panda, Dhabaleswar K.
    [J]. 2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 80 - 89
  • [4] Efficient Reliability Support for Hardware Multicast-based Broadcast in GPU-enabled Streaming Applications
    Chu, C. -H.
    Hamidouche, K.
    Subramoni, H.
    Venkatesh, A.
    Elton, B.
    Panda, D. K.
    [J]. PROCEEDINGS OF FIRST WORKSHOP ON OPTIMIZATION OF COMMUNICATION IN HPC RUNTIME SYSTEMS (COM-HPC 2016), 2016, : 29 - 38
  • [5] Distributed Join Algorithms on Multi-GPU Clusters with GPUDirect RDMA
    Guo, Chengxin
    Chen, Hong
    Zhang, Feng
    Li, Cuiping
    [J]. PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [6] Efficient multicast support exploiting mobility of hosts
    Choo, YY
    Huh, Y
    Kim, C
    [J]. IEICE TRANSACTIONS ON COMMUNICATIONS, 2002, E85B (06) : 1213 - 1217
  • [7] A novel power-efficient broadcast routing algorithm exploiting broadcast efficiency
    Kang, I
    Poovendran, R
    [J]. 2003 IEEE 58TH VEHICULAR TECHNOLOGY CONFERENCE, VOLS1-5, PROCEEDINGS, 2003, : 2926 - 2930
  • [8] Exploiting client bandwidth for more efficient video broadcast
    Hua, KA
    Cai, Y
    Sheu, S
    [J]. 7TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS - PROCEEDINGS, 1998, : 848 - 856
  • [9] Energy-Efficient Broadcast and Multicast Trees in Wireless Networks
    Jeffrey E. Wieselthier
    Gam D. Nguyen
    Anthony Ephremides
    [J]. Mobile Networks and Applications, 2002, 7 : 481 - 492
  • [10] Energy efficient broadcast and multicast trees for reliable wireless communication
    Banerjee, S
    Misra, A
    Yeo, JW
    Agrawala, A
    [J]. WCNC 2003: IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE RECORD, VOLS 1-3, 2003, : 660 - 667