A Δ-tree based similarity join processing for high-dimensional data

被引:0
|
作者
Liu, Yan [1 ,3 ]
Hao, Zhongxiao [1 ,2 ]
机构
[1] College of Computer and Control, Harbin University of Science and Technology, Harbin 150080, China
[2] College of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
[3] College of Computer Science and Technology, Changchun University, Changchun 130022, China
关键词
Clustering algorithms - Principal component analysis - Query processing - Trees (mathematics);
D O I
暂无
中图分类号
学科分类号
摘要
The similarity join, an important data mining primitive, can be successfully applied to speeding up applications such as similarity search, data analysis and data mining. So far most of researches focus on the execution of high-dimensional joins over large amounts of disk based data. The increasing sizes of main memory available on current computers, and the need for efficient processing of spatial joins suggest that spatial joins for a large class of problems can be processed in main memory. Δ-tree is a novel multi-level index structure, it can speed up the high-dimensional query in main memory environment and has been proven to be an efficient index method. Each level in the Δ-tree represents the data space at different dimensionalities: the number of dimensions increases towards the leaf level which contains the data at their full dimensions. The remaining dimensions are obtained using principal component analysis. Using the properties of Δ-tree, a similarity join algorithm on the basis of index structure Δ-tree, Δ-tree-join, is presented. The top-down scheme can use fewer number of dimensions, compute the distances and efficiently complete join processing. Experimental results indicate that Δ-tree-join outperforms the state-of-the-art algorithm, EGO-join, and EGO*-join by a wide margin, and is an efficient similarity join method.
引用
收藏
页码:995 / 1002
相关论文
共 50 条
  • [1] A KNN-join algorithm based on Δ-tree for high-dimensional data
    Liu, Yan
    Hao, Zhongxiao
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2010, 47 (07): : 1234 - 1243
  • [2] Progressive high-dimensional similarity join
    Tok, Wee Hyong
    Bressan, Stephane
    Lee, Mong-Li
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 233 - +
  • [3] Similarity Query Processing for High-Dimensional Data
    Qin, Jianbin
    Wang, Wei
    Xiao, Chuan
    Zhang, Ying
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12): : 3437 - 3440
  • [4] Efficient index-based KNN join processing for high-dimensional data
    Yu, Cui
    Cui, Bin
    Wang, Shuguang
    Su, Jianwen
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2007, 49 (04) : 332 - 344
  • [5] High-Dimensional Similarity Query Processing for Data Science
    Qin, Jianbin
    Wang, Wei
    Xiao, Chuan
    Zhang, Ying
    Wang, Yaoshu
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4062 - 4063
  • [6] Projection Based Large Scale High-Dimensional Data Similarity Join Using MapReduce Framework
    Ma, Youzhong
    Zhang, Ruiling
    Cui, Zhanyou
    Lin, Chunjie
    [J]. IEEE ACCESS, 2020, 8 : 121665 - 121677
  • [7] Epsilon grid order:: An algorithm for the similarity join on massive high-dimensional data
    Böhm, C
    Braunmüller, B
    Krebs, F
    Kriege, HP
    [J]. SIGMOD RECORD, 2001, 30 (02) : 379 - 388
  • [8] PHiDJ: Parallel Similarity Self-Join for High-Dimensional Vector Data with MapReduce
    Fries, Sergej
    Boden, Brigitte
    Stepien, Grzegorz
    Seidl, Thomas
    [J]. 2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 796 - 807
  • [9] A novel approach for high-dimensional vector similarity join query
    Ma, Youzhong
    Jia, Shijie
    Zhang, Yongxin
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (05):
  • [10] Accelerating Similarity-based Mining Tasks on High-dimensional Data by Processing-in-memory
    Wang, Fang
    Yiu, Man Lung
    Shao, Zili
    [J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1859 - 1864