Strategic and suave processing for performing similarity joins using MapReduce

被引:1
|
作者
Lakshminarayanan, Mahalakshmi [1 ]
Acosta, William F. [2 ]
Green, Robert C., II [3 ]
Devabhaktuni, Vijay [1 ]
机构
[1] Univ Toledo, Toledo, OH 43606 USA
[2] Harman Int, Vernon Hills, IL 60061 USA
[3] Bowling Green State Univ, Dept Comp Sci, Bowling Green, OH 43403 USA
来源
JOURNAL OF SUPERCOMPUTING | 2014年 / 69卷 / 02期
关键词
Similarity Joins; Multisets; MapReduce;
D O I
10.1007/s11227-014-1197-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
An efficient MapReduce Algorithm for performing Similarity Joins between multisets is proposed. Filtering techniques for similarity joins minimize the number of pairs of entities joined and hence improve the efficiency of the algorithm. Multisets represent real-world data better by considering the frequency of its elements. Prior serial algorithms incorporate filtering techniques only for sets, but not multisets, while prior MapReduce algorithms do not incorporate any filtering technique or inefficiently and unscalably incorporate prefix filtering. This work extends the filtering techniques, namely the prefix, size and positional to multisets, and also achieves the challenging task of efficiently incorporating them in the shared-nothing MapReduce model, thereby minimizing the pairs generated and joined, resulting in I/O, network and computational efficiency. A technique to enhance the scalability of the algorithm is also presented as a contingency need. Algorithms are developed using Hadoop and tested using real-world Twitter data. Experimental results demonstrate unprecedented performance gain.
引用
收藏
页码:930 / 954
页数:25
相关论文
共 50 条
  • [1] Strategic and suave processing for performing similarity joins using MapReduce
    Mahalakshmi Lakshminarayanan
    William F. Acosta
    Robert C. Green
    Vijay Devabhaktuni
    [J]. The Journal of Supercomputing, 2014, 69 : 930 - 954
  • [2] Metric Similarity Joins Using MapReduce
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (03) : 656 - 669
  • [3] Privacy preserving similarity joins using MapReduce
    Ding, Xiaofeng
    Yang, Wanlu
    Choo, Kim-Kwang Raymond
    Wang, Xiaoli
    Jin, Hai
    [J]. INFORMATION SCIENCES, 2019, 493 : 20 - 33
  • [4] Metric Similarity Joins Using MapReduce (Extended abstract)
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    [J]. 2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1787 - 1788
  • [5] Efficient processing distributed joins with bloomfilter using MapReduce
    School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China
    [J]. Int. J. Grid Distrib. Comput., 2013, 3 (43-58):
  • [6] Efficient Processing Distributed Joins with Bloomfilter using MapReduce
    Zhang, Changchun
    Wu, Lei
    Li, Jing
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2013, 6 (03): : 43 - 57
  • [7] Set Similarity Joins on MapReduce: An Experimental Survey
    Fier, Fabian
    Augsten, Nikolaus
    Bouros, Panagiotis
    Leser, Ulf
    Freytag, Johann-Christoph
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (10): : 1110 - 1122
  • [8] Efficient and Scalable Graph Similarity Joins in MapReduce
    Chen, Yifan
    Zhao, Xiang
    Xiao, Chuan
    Zhang, Weiming
    Tang, Jiuyang
    [J]. SCIENTIFIC WORLD JOURNAL, 2014,
  • [9] Practising Scalable Graph Similarity Joins in MapReduce
    Chen, Yifan
    Zhao, Xiang
    Ge, Bin
    Xiao, Chuan
    Chi, Chi-Hung
    [J]. 2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 112 - 119
  • [10] Fast and scalable vector similarity joins with MapReduce
    Yang, Byoungju
    Kim, Hyun Joon
    Shim, Junho
    Lee, Dongjoo
    Lee, Sang-goo
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2016, 46 (03) : 473 - 497