RHJoin: A Fast and Space-efficient Join Method for Log Processing in MapReduce

被引:0
|
作者
Tang, Dixin [1 ]
Liu, Taoying [1 ]
Liu, Hong [1 ]
Li, Wei [1 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
关键词
MapReduce; Join; Log Processing; Big data; MAP-REDUCE; SYSTEM;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Equi-join is heavily used in Map Reduce-based log processing. With the rapid growth of dataset sizes, join methods on MapReduce are extensively studied recently. We find that existing join methods usually cannot get high query performance and affordable storage consumption at the same time when faced with a huge amount of log data. They either only optimize one aspect but significantly sacrifice the other or have limited applications. In this paper, after analyzing characteristics of the workloads and underlying MapReduce, we present a join method with specific optimizations for log processing called RHJoin (Repartition Hash Join) and its implementation on Hadoop. In RHJoin, reference tables are partitioned in the pre-processing step, the log table is partitioned on the map side and hash join is executed on the reduce side. The shuffle procedure of MapReduce is also optimized by removing the sort step and overlapping the execution of mappers and reducers. Comprehensive experiments show that RHJoin achieves high query performance with only a small extra storage cost, and has wide application circumstances for log processing.
引用
收藏
页码:975 / 980
页数:6
相关论文
共 50 条
  • [31] Fast and space-efficient shapelets-based time-series classification
    Gordona, Daniel
    Hendler, Danny
    Rokach, Lior
    INTELLIGENT DATA ANALYSIS, 2015, 19 (05) : 953 - 981
  • [32] Space-Efficient Approximate String Matching Allowing Inversions in Fast Average Time
    Kim, Hwee
    Han, Yo-Sub
    FRONTIERS IN ALGORITHMICS, FAW 2014, 2014, 8497 : 141 - 150
  • [33] Fast and Space-Efficient Defense against Jump-oriented Programming Attacks
    Kim, Jeehong
    Eom, Young Ik
    2015 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2015, : 7 - 10
  • [34] Recent Results on Processing Random-Order Streams and Space-Efficient Sampling
    McGregor, Andrew
    2008 46TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, VOLS 1-3, 2008, : 206 - 208
  • [35] GreenBFS: Space-Efficient BFS Engine for Power-aware Graph Processing
    Gan, Xinbiao
    Guo, Peilin
    Wu, Guang
    Li, Tiejun
    2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 489 - 496
  • [36] A Time and Space-Efficient Compositional Method for Prime and Test Paths Generation
    Fazli, Ebrahim
    Afsharchi, Mohsen
    IEEE ACCESS, 2019, 7 : 134399 - 134410
  • [37] Space-efficient routing tables for almost all networks and the incompressibility method
    Buhrman, H
    Hoepman, JH
    Vitányi, P
    SIAM JOURNAL ON COMPUTING, 1999, 28 (04) : 1414 - 1432
  • [38] Fast and space-efficient taxonomic classification of long reads with hierarchical interleaved XOR filters
    Ulrich, Jens-Uwe
    Renard, Bernhard Y.
    GENOME RESEARCH, 2024, 34 (06) : 914 - 924
  • [39] Fast-join: An efficient method for fuzzy token matching based string similarity join
    Wang, Jiannan
    Li, Guoliang
    Fe, Jianhua
    Proceedings - International Conference on Data Engineering, 2011, : 458 - 469
  • [40] Fast-Join: An Efficient Method for Fuzzy Token Matching based String Similarity Join
    Wang, Jiannan
    Li, Guoliang
    Fe, Jianhua
    IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 458 - 469