RHJoin: A Fast and Space-efficient Join Method for Log Processing in MapReduce

被引:0
|
作者
Tang, Dixin [1 ]
Liu, Taoying [1 ]
Liu, Hong [1 ]
Li, Wei [1 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
来源
2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) | 2014年
关键词
MapReduce; Join; Log Processing; Big data; MAP-REDUCE; SYSTEM;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Equi-join is heavily used in Map Reduce-based log processing. With the rapid growth of dataset sizes, join methods on MapReduce are extensively studied recently. We find that existing join methods usually cannot get high query performance and affordable storage consumption at the same time when faced with a huge amount of log data. They either only optimize one aspect but significantly sacrifice the other or have limited applications. In this paper, after analyzing characteristics of the workloads and underlying MapReduce, we present a join method with specific optimizations for log processing called RHJoin (Repartition Hash Join) and its implementation on Hadoop. In RHJoin, reference tables are partitioned in the pre-processing step, the log table is partitioned on the map side and hash join is executed on the reduce side. The shuffle procedure of MapReduce is also optimized by removing the sort step and overlapping the execution of mappers and reducers. Comprehensive experiments show that RHJoin achieves high query performance with only a small extra storage cost, and has wide application circumstances for log processing.
引用
收藏
页码:975 / 980
页数:6
相关论文
共 50 条
  • [21] Space-Efficient SLP Encoding for O(log N)-Time Random Access
    Takasaka, Akito
    Tomohiro, I
    STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2024, 2025, 14899 : 336 - 347
  • [22] A fast and space-efficient boundary element method for computing electrostatic and hydration effects in large molecules
    Tripos, Inc., 1699 S. Hanley Road, St. Louis, MO 63144, United States
    不详
    J. Comput. Chem., 7 (864-877):
  • [23] A fast and space-efficient boundary element method for computing electrostatic and hydration effects in large molecules
    Zauhar, RJ
    Varnek, A
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 1996, 17 (07) : 864 - 877
  • [24] A SPACE-EFFICIENT ONLINE METHOD OF COMPUTING QUANTILE ESTIMATES
    PEARL, J
    JOURNAL OF ALGORITHMS, 1981, 2 (02) : 164 - 177
  • [25] ChronoView: A Space-Efficient Method for Visualizing Temporal Patterns
    Misue, Kazuo
    2014 11TH INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS, IMAGING AND VISUALIZATION (CGIV): NEW TECHNIQUES AND TRENDS, 2014, : 1 - 4
  • [26] A fast, space-efficient algorithm for the approximation of images by an optimal sum of Gaussians
    Childs, J
    Lu, CC
    Potter, J
    GRAPHICS INTERFACE 2000, PROCEEDINGS, 2000, : 153 - 162
  • [27] An efficient parallel processing method for skyline queries in MapReduce
    Junsu Kim
    Myoung Ho Kim
    The Journal of Supercomputing, 2018, 74 : 886 - 935
  • [28] An efficient parallel processing method for skyline queries in MapReduce
    Kim, Junsu
    Kim, Myoung Ho
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (02): : 886 - 935
  • [29] An Efficient Two-Table Join Query Processing Based on Extended Bloom Filter in MapReduce
    Wang, Junlu
    Pang, Jun
    Li, Xiaoyan
    Han, Baishuo
    Huang, Lei
    Ding, Linlin
    WEB-AGE INFORMATION MANAGEMENT, 2016, 9998 : 249 - 258
  • [30] Space-Efficient, Fast and Exact Routing in Time-Dependent Road Networks
    Strasser, Ben
    Wagner, Dorothea
    Zeitz, Tim
    ALGORITHMS, 2021, 14 (03)