Fast similarity join for multi-dimensional data

被引:13
|
作者
Kalashnikov, Dmitri V.
Prabhakar, Sunil
机构
[1] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92697 USA
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
基金
美国国家科学基金会;
关键词
similarity join; grid-based joins;
D O I
10.1016/j.is.2005.07.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The efficient processing of multidimensional similarity joins is important for a large class of applications. The dimensionality of the data for these applications ranges from low to high. Most existing methods have focused on the execution of high-dimensional joins over large amounts of disk-based data. The increasing sizes of main memory available on current computers, and the need for efficient processing of spatial joins suggest that spatial joins for a large class of problems can be processed in main memory. In this paper, we develop two new in-memory spatial join algorithms, the Grid-join and EGO*-join, and study their performance. Through evaluation, we explore the domain of applicability of each approach and provide recommendations for the choice of a join algorithm depending upon the dimensionality of the data as well as the expected selectivity of the join. We show that the two new proposed join techniques substantially outperform the state-of-the-art join algorithm, the EGO-join. (C) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:160 / 177
页数:18
相关论文
共 50 条
  • [1] Super-EGO: fast multi-dimensional similarity join
    Dmitri V. Kalashnikov
    [J]. The VLDB Journal, 2013, 22 : 561 - 585
  • [2] Super-EGO: fast multi-dimensional similarity join
    Kalashnikov, Dmitri V.
    [J]. VLDB JOURNAL, 2013, 22 (04): : 561 - 585
  • [3] GPU-Accelerated Similarity Self-Join for Multi-Dimensional Data
    Gowanlock, Michael
    Karsin, Ben
    [J]. 15TH INTERNATIONAL WORKSHOP ON DATA MANAGEMENT ON NEW HARDWARE (DAMON 2019), 2019,
  • [4] Similarity Search Problem Research on Multi-dimensional Data Sets
    Shi, Yong
    Graham, Brian
    [J]. PROCEEDINGS OF THE 2013 10TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2013, : 573 - 577
  • [5] Similarity Group-by Operators for Multi-Dimensional Relational Data
    Tang, Mingjie
    Tahboub, Ruby Y.
    Aref, Walid G.
    Atallah, Mikhail J.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    Silva, Yasin N.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (02) : 510 - 523
  • [6] Similarity Group-by Operators for Multi-dimensional Relational Data
    Tang, Mingjie
    Tahboub, Ruby Y.
    Aref, Walid G.
    Atallah, Mikhail J.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    Silva, Yasin N.
    [J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1448 - 1449
  • [7] Fast and Adaptive Indexing of Multi-Dimensional Observational Data
    Wang, Sheng
    Maier, David
    Ooi, Beng Chin
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (14): : 1683 - 1694
  • [8] Fast Similarity Search of Multi-dimensional Time Series via Segment Rotation
    Gong, Xudong
    Xiong, Yan
    Huang, Wenchao
    Chen, Lei
    Lu, Qiwei
    Hu, Yiqing
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT1, 2015, 9049 : 108 - 124
  • [9] Fast Anomaly Detection in Multiple Multi-Dimensional Data Streams
    Sun, Hongyu
    He, Qiang
    Liao, Kewen
    Sellis, Timos
    Guo, Longkun
    Zhang, Xuyun
    Shen, Jun
    Chen, Feifei
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1218 - 1223
  • [10] Fast geometrical extraction of nearest neighbors from multi-dimensional data
    Aziz, Yasir
    Memon, Kashif Hussain
    [J]. PATTERN RECOGNITION, 2023, 136