Multidimensional Similarity Join Using MapReduce

被引:1
|
作者
Li, Ye [1 ]
Wang, Jian [1 ]
Hou, Leong U. [1 ]
机构
[1] Univ Macau, Zhuhai Res Inst, Macau, Peoples R China
来源
关键词
D O I
10.1007/978-3-319-39958-4_36
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Similarity join is arguably one of the most important operators in multidimensional data analysis tasks. However, processing a similarity join is costly especially for large volume and high dimensional data. In this work, we attempt to process the similarity join on MapReduce such that the join computation can be scaled horizontally. In order to make the workload balancing among all MapReduce nodes, we systemically select the most profitable feature based on a novel data selectivity approach. Given the selected feature, we develop the partitioning scheme for MapReduce processing based on two different optimization goals. Our proposed techniques are extensively evaluated on real datasets.
引用
收藏
页码:457 / 468
页数:12
相关论文
共 50 条
  • [1] Scalable Metric Similarity Join using MapReduce
    Wu, Jiacheng
    Zhang, Yong
    Wang, Jin
    Lin, Chunbin
    Fu, Yingjia
    Xing, Chunxiao
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1662 - 1665
  • [2] Efficient Spatio-textual Similarity Join Using MapReduce
    Zhang, Yu
    Ma, Youzhong
    Meng, Xiaofeng
    [J]. 2014 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2014, : 52 - 59
  • [3] Set similarity join on massive probabilistic data using MapReduce
    Youzhong Ma
    Xiaofeng Meng
    [J]. Distributed and Parallel Databases, 2014, 32 : 447 - 464
  • [4] Set similarity join on massive probabilistic data using MapReduce
    Ma, Youzhong
    Meng, Xiaofeng
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2014, 32 (03) : 447 - 464
  • [5] Towards a Scalable Set Similarity Join Using MapReduce and LSH
    Rivault, Sebastien
    Bamha, Mostafa
    Limet, Sebastien
    Robert, Sophie
    [J]. COMPUTATIONAL SCIENCE - ICCS 2022, PT I, 2022, : 569 - 583
  • [6] A Study on Subsequence Similarity Join in Time Series Data Using MapReduce
    Park, Kyounghyun
    Won, Hee Sun
    Ryu, Keun Ho
    [J]. ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2018, 474 : 851 - 859
  • [7] Parallel Top-K Similarity Join Algorithms Using MapReduce
    Kim, Younghoon
    Shim, Kyuseok
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 510 - 521
  • [8] Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce
    Xu, Jia
    Lei, Bin
    Gu, Yu
    Winslett, Marianne
    Yu, Ge
    Zhang, Zhenjie
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (08) : 2148 - 2162
  • [9] Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce
    Xu, Jia
    Lei, Bin
    Gu, Yu
    Winslett, Marianne
    Yu, Ge
    Zhang, Zhenjie
    [J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1456 - 1457
  • [10] Efficient Graph Similarity Join with Scalable Prefix-Filtering Using MapReduce
    Pang, Jun
    Gu, Yu
    Xu, Jia
    Bao, Yubin
    Yu, Ge
    [J]. WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 415 - 418