High-Performance Design of Hadoop RPC with RDMA over InfiniBand

被引:60
|
作者
Lu, Xiaoyi [1 ]
Islam, Nusrat S. [1 ]
Wasi-ur-Rahman, Md [1 ]
Jose, Jithin [1 ]
Subramoni, Hari [1 ]
Wang, Hao [1 ]
Panda, Dhabaleswar K. [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
关键词
D O I
10.1109/ICPP.2013.78
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop RPC is the basic communication mechanism in the Hadoop ecosystem. It is used with other Hadoop components like MapReduce, HDFS, and HBase in real world data-centers, e.g. Facebook and Yahoo!. However, the current Hadoop RPC design is built on Java sockets interface, which limits its potential performance. The High Performance Computing community has exploited high throughput and low latency networks such as InfiniBand for many years. In this paper, we first analyze the performance of current Hadoop RPC design by unearthing buffer management and communication bottlenecks, that are not apparent on the slower speed networks. Then we propose a novel design (RPCoIB) of Hadoop RPC with RDMA over InfiniBand networks. RPCoIB provides a JVM-bypassed buffer management scheme and utilizes message size locality to avoid multiple memory allocations and copies in data serialization and deserialization. Our performance evaluations reveal that the basic ping-pong latencies for varied data sizes are reduced by 42%-49% and 46%-50% compared with 10GigE and IPoIB QDR (32 Gbps), respectively, while the RPCoIB design also improves the peak throughput by 82% and 64% compared with 10GigE and IPoIB. As compared to default Hadoop over IPoIB QDR, our RPCoIB design improves the performance of the Sort benchmark on 64 compute nodes by 15%, while it improves the performance of CloudBurst application by 10%. We also present thorough, integrated evaluations of our RPCoIB design with other research directions, which optimize HDFS and HBase using RDMA over InfiniBand. Compared with their best performance, we observe 10% improvement for HDFS-IB, and 24% improvement for HBase-IB. To the best of our knowledge, this is the first such design of the Hadoop RPC system over high performance networks such as InfiniBand.
引用
收藏
页码:641 / 650
页数:10
相关论文
共 50 条
  • [41] RDMA-Based Apache Storm for High-Performance Stream Data Processing
    Zhang, Ziyu
    Liu, Zitan
    Jiang, Qingcai
    Chen, Junshi
    An, Hong
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2021, 49 (05) : 671 - 684
  • [42] Exploring Infiniband Hardware Virtualization in OpenNebula towards Efficient High-Performance Computing
    Ruivo, Tiago Pais Pitta de Lacerda
    Altayo, Gerard Bernabeu
    Garzoglio, Gabriele
    Timm, Steven
    Kim, Hyun Woo
    Noh, Seo-Young
    Raicu, Ioan
    [J]. 2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2014, : 943 - 948
  • [43] RXIO: Design and implementation of high performance RDMA-capable GridFTP
    Tian, Yuan
    Yu, Weikuan
    Vetter, Jeffrey S.
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2012, 38 (03) : 772 - 784
  • [44] BoR: Toward High-Performance Permissioned Blockchain in RDMA-Enabled Network
    Huang, Bobo
    Jin, Li
    Lu, ZhiHui
    Zhou, Xin
    Wu, Jie
    Tang, Qifeng
    Hung, Patrick C. K.
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2020, 13 (02) : 301 - 313
  • [45] RDMA-Based Apache Storm for High-Performance Stream Data Processing
    Ziyu Zhang
    Zitan Liu
    Qingcai Jiang
    Junshi Chen
    Hong An
    [J]. International Journal of Parallel Programming, 2021, 49 : 671 - 684
  • [46] Optimizing the Hadoop MapReduce Framework with high-performance storage devices
    Sangwhan Moon
    Jaehwan Lee
    Xiling Sun
    Yang-suk Kee
    [J]. The Journal of Supercomputing, 2015, 71 : 3525 - 3548
  • [47] Optimizing the Hadoop MapReduce Framework with high-performance storage devices
    Moon, Sangwhan
    Lee, Jaehwan
    Sun, Xiling
    Kee, Yang-suk
    [J]. JOURNAL OF SUPERCOMPUTING, 2015, 71 (09): : 3525 - 3548
  • [48] High performance RDMA protocols in HPC
    Woodall, Tim S.
    Shipman, Galen M.
    Bosilca, George
    Graham, Richard L.
    Maccabe, Arthur B.
    [J]. RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 2006, 4192 : 76 - 85
  • [49] A High Performance Superpipeline Protocol for InfiniBand
    Denis, Alexandre
    [J]. EURO-PAR 2011 PARALLEL PROCESSING, PT 2, 2011, 6853 : 276 - 287
  • [50] Performance Characterization of Hadoop Workloads on SR-IOV-enabled Virtualized InfiniBand Clusters
    Gugnani, Shashank
    Lu, Xiaoyi
    Panda, Dhabaleswar K.
    [J]. 2016 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT), 2016, : 36 - 45