Efficient processing distributed joins with bloomfilter using MapReduce

被引:0
|
作者
School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China [1 ]
机构
来源
Int. J. Grid Distrib. Comput. | 2013年 / 3卷 / 43-58期
关键词
Efficiency;
D O I
暂无
中图分类号
学科分类号
摘要
The MapReduce framework has been widely used to process and analyze largescale datasets over large clusters. As an essential problem, join operation among large clusters attracts more and more attention in recent years due to the utilization of MapReduce. Many strategies have been proposed to improve the efficiency of distributed join, among which bloomfilter is a successful one. However, the bloomfilter's potential has not yet been fully exploited, especially in the MapReduce environment. In this paper, three strategies are presented to build the bloomfilter for the large datasets using MapReduce. Based on these strategies, we design two algorithms for two-way join and one algorithm for multi-way join. The experimental results show that our algorithms can significantly improve the efficiency of current join algorithm. Moreover, cost models of these algorithms are characterized in order to find out the way of improving the performance of two-way and multi-way joins.
引用
收藏
相关论文
共 50 条
  • [31] An Efficient Batch Similarity Processing with MapReduce
    Trong Nhan Phan
    Tran Khanh Dang
    [J]. FUTURE DATA AND SECURITY ENGINEERING, FDSE 2018, 2018, 11251 : 158 - 171
  • [32] Efficient Snapshot KNN Join Processing for Large Data Using MapReduce
    Hu, Yupeng
    Yang, Chong
    Ji, Cun
    Xu, Yang
    Li, Xueqing
    [J]. 2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 713 - 720
  • [33] Geospatial Hadoop (GS-Hadoop) An efficient MapReduce based engine for distributed processing of Shapefiles
    Abdul, Jhummarwala
    Alkathiri, Mazin
    Potdar, M. B.
    [J]. 2016 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION, & AUTOMATION (ICACCA) (FALL), 2016, : 22 - 28
  • [34] Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
    Chen, Rong
    Chen, Haibo
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 10 (01)
  • [35] Efficient Distributed k-Clique Mining for Large Networks Using MapReduce
    Shahrivari, Saeed
    Jalili, Saeed
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (03) : 964 - 974
  • [36] Efficient processing of outer joins and aggregate functions
    Bhargava, G
    Goel, P
    Iyer, B
    [J]. PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, : 441 - 449
  • [37] PBiTree coding and efficient processing of containment joins
    Wang, W
    Jiang, HF
    Lu, HJ
    Yu, JX
    [J]. 19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 391 - 402
  • [38] Efficient Probabilistic Skyline Query Processing in MapReduce
    Ding, Linlin
    Wang, Guoren
    Xin, Junchang
    Yuan, Ye
    [J]. 2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 203 - 210
  • [39] Efficient Big Data Processing in Hadoop MapReduce
    Dittrich, Jens
    Quiane-Ruiz, Jorge-Arnulfo
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 2014 - 2015
  • [40] Efficient Batch Processing of Proximity Queries with MapReduce
    Nam, GiWoong
    Kim, DongEun
    Lee, JongHyeok
    Youn, Hee Yong
    Kim, Ung-Mo
    [J]. ACM IMCOM 2015, Proceedings, 2015,