RHJoin: A Fast and Space-efficient Join Method for Log Processing in MapReduce

被引:0
|
作者
Tang, Dixin [1 ]
Liu, Taoying [1 ]
Liu, Hong [1 ]
Li, Wei [1 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
关键词
MapReduce; Join; Log Processing; Big data; MAP-REDUCE; SYSTEM;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Equi-join is heavily used in Map Reduce-based log processing. With the rapid growth of dataset sizes, join methods on MapReduce are extensively studied recently. We find that existing join methods usually cannot get high query performance and affordable storage consumption at the same time when faced with a huge amount of log data. They either only optimize one aspect but significantly sacrifice the other or have limited applications. In this paper, after analyzing characteristics of the workloads and underlying MapReduce, we present a join method with specific optimizations for log processing called RHJoin (Repartition Hash Join) and its implementation on Hadoop. In RHJoin, reference tables are partitioned in the pre-processing step, the log table is partitioned on the map side and hash join is executed on the reduce side. The shuffle procedure of MapReduce is also optimized by removing the sort step and overlapping the execution of mappers and reducers. Comprehensive experiments show that RHJoin achieves high query performance with only a small extra storage cost, and has wide application circumstances for log processing.
引用
收藏
页码:975 / 980
页数:6
相关论文
共 50 条
  • [1] RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems
    He, Yongqiang
    Lee, Rubao
    Huai, Yin
    Shao, Zheng
    Jain, Namit
    Zhang, Xiaodong
    Xu, Zhiwei
    IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 1199 - 1208
  • [2] Fast and space-efficient spin sensing
    Hu, Xuedong
    NATURE NANOTECHNOLOGY, 2019, 14 (08) : 735 - 736
  • [3] Fast and space-efficient spin sensing
    Xuedong Hu
    Nature Nanotechnology, 2019, 14 : 735 - 736
  • [4] PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data Compression
    Hurst, Aaron
    Lucani, Daniel E.
    Zhang, Qi
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (06): : 1432 - 1445
  • [5] A space-efficient fast prime number sieve
    Dunten, B
    Jones, J
    Sorenson, J
    INFORMATION PROCESSING LETTERS, 1996, 59 (02) : 79 - 84
  • [6] Fast and space-efficient adaptive arithmetic coding
    Ryabko, B
    Fionov, A
    CRYPTOGRAPHY AND CODING, 1999, 1746 : 270 - 279
  • [7] Fast and Space-Efficient Virtual Machine Checkpointing
    Park, Eunbyung
    Egger, Bernhard
    Lee, Jaejin
    ACM SIGPLAN NOTICES, 2011, 46 (07) : 75 - 85
  • [8] Provably Fast and Space-Efficient Parallel Biconnectivity
    Dong, Xiaojun
    Wang, Letong
    Gu, Yan
    Sun, Yihan
    arXiv, 2023,
  • [9] Provably Fast and Space-Efficient Parallel Biconnectivity
    Dong, Xiaojun
    Wang, Letong
    Gu, Yan
    Sun, Yihan
    Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, 2023, : 52 - 65
  • [10] Fast and Space-Efficient Entity Linking in Queries
    Blanco, Roi
    Ottaviano, Giuseppe
    Meij, Edgar
    WSDM'15: PROCEEDINGS OF THE EIGHTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2015, : 179 - 188