A Scalable Similarity Join Algorithm Based on MapReduce and LSH

被引:3
|
作者
Rivault, Sebastien [1 ]
Bamha, Mostafa [1 ]
Limet, Sebastien [1 ]
Robert, Sophie [1 ]
机构
[1] Univ Orleans, INSA Ctr Val Loire, EA, LIFO, F-4022 Orleans, France
关键词
Similarity join operations; Local sensitive hashing (LSH); MapReduce model; Data skew; Hadoop framework;
D O I
10.1007/s10766-022-00733-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Similarity joins are recognized to be among the most useful data processing and analysis operations. A similarity join is used to retrieve all data pairs whose distances are smaller than a predefined threshold 2. In this paper, we introduce the MRS-join algorithm to perform similarity joins on large trajectory datasets. The MapReduce model and a randomized local sensitive hashing keys redistribution approach are used to balance load among processing nodes while reducing communications and computations to almost all relevant data by using distributed histograms. A cost analysis of the MRS-join algorithm shows that our approach is insensitive to data skew and guarantees perfect balancing properties, in large scale systems, during all stages of similarity join computations. These performances have been confirmed by a series of experiments using the Frechet distance on large datasets of trajectories from real world and synthetic data benchmarks.
引用
收藏
页码:360 / 380
页数:21
相关论文
共 50 条
  • [1] A Scalable Similarity Join Algorithm Based on MapReduce and LSH
    Sébastien Rivault
    Mostafa Bamha
    Sébastien Limet
    Sophie Robert
    International Journal of Parallel Programming, 2022, 50 : 360 - 380
  • [2] Towards a Scalable Set Similarity Join Using MapReduce and LSH
    Rivault, Sebastien
    Bamha, Mostafa
    Limet, Sebastien
    Robert, Sophie
    COMPUTATIONAL SCIENCE - ICCS 2022, PT I, 2022, : 569 - 583
  • [3] Scalable Metric Similarity Join using MapReduce
    Wu, Jiacheng
    Zhang, Yong
    Wang, Jin
    Lin, Chunbin
    Fu, Yingjia
    Xing, Chunxiao
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1662 - 1665
  • [4] An efficient MapReduce algorithm for similarity join in metric spaces
    Wen Liu
    Yanming Shen
    Peng Wang
    The Journal of Supercomputing, 2016, 72 : 1179 - 1200
  • [5] An efficient MapReduce algorithm for similarity join in metric spaces
    Liu, Wen
    Shen, Yanming
    Wang, Peng
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (03): : 1179 - 1200
  • [6] Efficient Graph Similarity Join with Scalable Prefix-Filtering Using MapReduce
    Pang, Jun
    Gu, Yu
    Xu, Jia
    Bao, Yubin
    Yu, Ge
    WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 415 - 418
  • [7] A Generic Method for Accelerating LSH-Based Similarity Join Processing
    Yu, Chenyun
    Nutanong, Sarana
    Li, Hangyu
    Wang, Cong
    Yuan, Xingliang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (04) : 712 - 726
  • [8] A Density-Aware Similarity Join Query Processing Algorithm on MapReduce
    Jang, Miyoung
    Song, Youngho
    Chang, Jae-Woo
    ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING: FUTURETECH & MUE, 2016, 393 : 469 - 475
  • [9] CSMR: A scalable algorithm for text clustering with cosine similarity and MapReduce
    Victor, Giannakouris-Salalidis
    Antonia, Plerou
    Spyros, Sioutas
    IFIP Advances in Information and Communication Technology, 2014, 437 : 211 - 220
  • [10] Multidimensional Similarity Join Using MapReduce
    Li, Ye
    Wang, Jian
    Hou, Leong U.
    WEB-AGE INFORMATION MANAGEMENT, PT II, 2016, 9659 : 457 - 468