Efficient processing distributed joins with bloomfilter using MapReduce

被引:0
|
作者
School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China [1 ]
机构
来源
Int. J. Grid Distrib. Comput. | 2013年 / 3卷 / 43-58期
关键词
Efficiency;
D O I
暂无
中图分类号
学科分类号
摘要
The MapReduce framework has been widely used to process and analyze largescale datasets over large clusters. As an essential problem, join operation among large clusters attracts more and more attention in recent years due to the utilization of MapReduce. Many strategies have been proposed to improve the efficiency of distributed join, among which bloomfilter is a successful one. However, the bloomfilter's potential has not yet been fully exploited, especially in the MapReduce environment. In this paper, three strategies are presented to build the bloomfilter for the large datasets using MapReduce. Based on these strategies, we design two algorithms for two-way join and one algorithm for multi-way join. The experimental results show that our algorithms can significantly improve the efficiency of current join algorithm. Moreover, cost models of these algorithms are characterized in order to find out the way of improving the performance of two-way and multi-way joins.
引用
收藏
相关论文
共 50 条
  • [1] Efficient Processing Distributed Joins with Bloomfilter using MapReduce
    Zhang, Changchun
    Wu, Lei
    Li, Jing
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2013, 6 (03): : 43 - 57
  • [2] Efficient Processing of k Nearest Neighbor Joins using MapReduce
    Lu, Wei
    Shen, Yanyan
    Chen, Su
    Ooi, Beng Chin
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (10): : 1016 - 1027
  • [3] Efficient Processing of Top-k Joins in MapReduce
    Saouk, Mei
    Doulkeridis, Christos
    Vlachou, Akrivi
    Norvag, Kjetil
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 570 - 577
  • [4] Optimizing Distributed Joins with Bloom Filters Using MapReduce
    Zhang, Changchun
    Wu, Lei
    Li, Jing
    [J]. COMPUTER APPLICATIONS FOR GRAPHICS, GRID COMPUTING, AND INDUSTRIAL ENVIRONMENT, 2012, 351 : 88 - 95
  • [5] Strategic and suave processing for performing similarity joins using MapReduce
    Mahalakshmi Lakshminarayanan
    William F. Acosta
    Robert C. Green
    Vijay Devabhaktuni
    [J]. The Journal of Supercomputing, 2014, 69 : 930 - 954
  • [6] Strategic and suave processing for performing similarity joins using MapReduce
    Lakshminarayanan, Mahalakshmi
    Acosta, William F.
    Green, Robert C., II
    Devabhaktuni, Vijay
    [J]. JOURNAL OF SUPERCOMPUTING, 2014, 69 (02): : 930 - 954
  • [7] Efficient processing of distributed Iceberg Semi-Joins
    Imthiyaz, MK
    Dong, XA
    Kalnis, P
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, 3180 : 634 - 643
  • [8] Fuzzy Joins Using MapReduce
    Afrati, Foto N.
    Das Sarma, Anish
    Menestrina, David
    Parameswaran, Aditya
    Ullman, Jeffrey D.
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 498 - 509
  • [9] Efficient and Scalable Graph Similarity Joins in MapReduce
    Chen, Yifan
    Zhao, Xiang
    Xiao, Chuan
    Zhang, Weiming
    Tang, Jiuyang
    [J]. SCIENTIFIC WORLD JOURNAL, 2014,
  • [10] Efficient Large Outer Joins over MapReduce
    Cheng, Long
    Kotoulas, Spyros
    [J]. EURO-PAR 2016: PARALLEL PROCESSING, 2016, 9833 : 334 - 346