Metric Similarity Joins Using MapReduce

被引:22
|
作者
Chen, Gang [1 ,2 ]
Yang, Keyu [1 ]
Chen, Lu [1 ]
Gao, Yunjun [1 ,2 ]
Zheng, Baihua [3 ]
Chen, Chun [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, 38 Zheda Rd, Hangzhou 310027, Peoples R China
[2] Zhejiang Univ, Key Lab Big Data Intelligent Comp Zhejiang Prov, 38 Zheda Rd, Hangzhou 310027, Peoples R China
[3] Singapore Management Univ, Sch Informat Syst, Singapore 178902, Singapore
关键词
Similarity joins; metric space; MapReduce; algorithm; QUERIES;
D O I
10.1109/TKDE.2016.2631599
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given two object sets Q and O, a metric similarity join finds similar object pairs according to a certain criterion. This operation has a wide variety of applications in data cleaning and data mining, to name but a few. However, the rapidly growing volume of data nowadays challenges traditional metric similarity join methods, and thus, a distributed method is required. In this paper, we adopt a popular distributed framework, namely, MapReduce, to support scalable metric similarity joins. To ensure the load balancing, we present two sampling based partition methods. One utilizes the pivot and the space-filling curve mappings to cluster the data into one-dimensional space, and then selects high quality centroids to enable equal-sized partitions. The other uses the KD-tree partitioning technique to equally divide the data after the pivot mapping. To avoid unnecessary object pair evaluation, we propose a framework that maps the two involved object sets in order, where the range-object filtering, the double-pivot filtering, the pivot filtering, and the plane sweeping techniques are utilized for pruning. Extensive experiments with both real and synthetic data sets demonstrate that our solutions outperform significantly existing state-of-the-art competitors.
引用
收藏
页码:656 / 669
页数:14
相关论文
共 50 条
  • [1] Metric Similarity Joins Using MapReduce (Extended abstract)
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1787 - 1788
  • [2] Privacy preserving similarity joins using MapReduce
    Ding, Xiaofeng
    Yang, Wanlu
    Choo, Kim-Kwang Raymond
    Wang, Xiaoli
    Jin, Hai
    INFORMATION SCIENCES, 2019, 493 : 20 - 33
  • [3] Scalable Metric Similarity Join using MapReduce
    Wu, Jiacheng
    Zhang, Yong
    Wang, Jin
    Lin, Chunbin
    Fu, Yingjia
    Xing, Chunxiao
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1662 - 1665
  • [4] Strategic and suave processing for performing similarity joins using MapReduce
    Mahalakshmi Lakshminarayanan
    William F. Acosta
    Robert C. Green
    Vijay Devabhaktuni
    The Journal of Supercomputing, 2014, 69 : 930 - 954
  • [5] Strategic and suave processing for performing similarity joins using MapReduce
    Lakshminarayanan, Mahalakshmi
    Acosta, William F.
    Green, Robert C., II
    Devabhaktuni, Vijay
    JOURNAL OF SUPERCOMPUTING, 2014, 69 (02): : 930 - 954
  • [6] Metric space similarity joins
    Jacox, Edwin H.
    Samet, Hanan
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2008, 33 (02):
  • [7] Set Similarity Joins on MapReduce: An Experimental Survey
    Fier, Fabian
    Augsten, Nikolaus
    Bouros, Panagiotis
    Leser, Ulf
    Freytag, Johann-Christoph
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (10): : 1110 - 1122
  • [8] Efficient and Scalable Graph Similarity Joins in MapReduce
    Chen, Yifan
    Zhao, Xiang
    Xiao, Chuan
    Zhang, Weiming
    Tang, Jiuyang
    SCIENTIFIC WORLD JOURNAL, 2014,
  • [9] Practising Scalable Graph Similarity Joins in MapReduce
    Chen, Yifan
    Zhao, Xiang
    Ge, Bin
    Xiao, Chuan
    Chi, Chi-Hung
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 112 - 119
  • [10] Fast and scalable vector similarity joins with MapReduce
    Yang, Byoungju
    Kim, Hyun Joon
    Shim, Junho
    Lee, Dongjoo
    Lee, Sang-goo
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2016, 46 (03) : 473 - 497