A Δ-tree based similarity join processing for high-dimensional data

被引：0

作者：

Liu, Yan ^{[1
,3
]}

Hao, Zhongxiao ^{[1
,2
]}

机构：

[1] College of Computer and Control, Harbin University of Science and Technology, Harbin 150080, China

[2] College of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China

[3] College of Computer Science and Technology, Changchun University, Changchun 130022, China

来源：

Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2009年 / 46卷 / 06期

关键词：

Clustering algorithms - Principal component analysis - Query processing - Trees (mathematics);

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The similarity join, an important data mining primitive, can be successfully applied to speeding up applications such as similarity search, data analysis and data mining. So far most of researches focus on the execution of high-dimensional joins over large amounts of disk based data. The increasing sizes of main memory available on current computers, and the need for efficient processing of spatial joins suggest that spatial joins for a large class of problems can be processed in main memory. Δ-tree is a novel multi-level index structure, it can speed up the high-dimensional query in main memory environment and has been proven to be an efficient index method. Each level in the Δ-tree represents the data space at different dimensionalities: the number of dimensions increases towards the leaf level which contains the data at their full dimensions. The remaining dimensions are obtained using principal component analysis. Using the properties of Δ-tree, a similarity join algorithm on the basis of index structure Δ-tree, Δ-tree-join, is presented. The top-down scheme can use fewer number of dimensions, compute the distances and efficiently complete join processing. Experimental results indicate that Δ-tree-join outperforms the state-of-the-art algorithm, EGO-join, and EGO*-join by a wide margin, and is an efficient similarity join method.

引用

页码：995 / 1002

共 50 条

[1] A KNN-join algorithm based on Δ-tree for high-dimensional data
Liu, Yan
Hao, Zhongxiao
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2010, 47 (07): : 1234 - 1243
[2] Progressive high-dimensional similarity join
Tok, Wee Hyong
Bressan, Stephane
Lee, Mong-Li
[J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 233 - +
[3] Similarity Query Processing for High-Dimensional Data
Qin, Jianbin
Wang, Wei
Xiao, Chuan
Zhang, Ying
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12): : 3437 - 3440
[4] Efficient index-based KNN join processing for high-dimensional data
Yu, Cui
Cui, Bin
Wang, Shuguang
Su, Jianwen
[J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2007, 49 (04) : 332 - 344
[5] High-Dimensional Similarity Query Processing for Data Science
Qin, Jianbin
Wang, Wei
Xiao, Chuan
Zhang, Ying
Wang, Yaoshu
[J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4062 - 4063
[6] Projection Based Large Scale High-Dimensional Data Similarity Join Using MapReduce Framework
Ma, Youzhong
Zhang, Ruiling
Cui, Zhanyou
Lin, Chunjie
[J]. IEEE ACCESS, 2020, 8 : 121665 - 121677
[7] Epsilon grid order:: An algorithm for the similarity join on massive high-dimensional data
Böhm, C
Braunmüller, B
Krebs, F
Kriege, HP
[J]. SIGMOD RECORD, 2001, 30 (02) : 379 - 388
[8] PHiDJ: Parallel Similarity Self-Join for High-Dimensional Vector Data with MapReduce
Fries, Sergej
Boden, Brigitte
Stepien, Grzegorz
Seidl, Thomas
[J]. 2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 796 - 807
[9] A novel approach for high-dimensional vector similarity join query
Ma, Youzhong
Jia, Shijie
Zhang, Yongxin
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (05):
[10] Accelerating Similarity-based Mining Tasks on High-dimensional Data by Processing-in-memory
Wang, Fang
Yiu, Man Lung
Shao, Zili
[J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1859 - 1864

← 1 2 3 4 5 →