Similarity Query Processing for High-Dimensional Data

被引：8

作者：

Qin, Jianbin ^{[1
]}

Wang, Wei ^{[2
]}

Xiao, Chuan ^{[3
,4
]}

Zhang, Ying ^{[5
]}

机构：

[1] Shenzhen Univ, Shenzhen Inst Comp Sci, Shenzhen, Guangdong, Peoples R China

[2] Univ New South Wales, Sydney, NSW, Australia

[3] Osaka Univ, Suita, Osaka, Japan

[4] Nagoya Univ, Nagoya, Aichi, Japan

[5] Univ Technol Sydney, Sydney, NSW, Australia

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2020年 / 13卷 / 12期

关键词：

NEAREST-NEIGHBOR SEARCH; SMALL WORLD; ALGORITHM; SPACE; LSH;

D O I：

10.14778/3415478.3415564

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Similarity query processing has been an active research topic for several decades. It is an essential procedure in a wide range of applications. Recently, embedding and auto-encoding methods as well as pre-trained models have gained popularity. They basically deal with high-dimensional data, and this trend brings new opportunities and challenges to similarity query processing for high-dimensional data. Meanwhile, new techniques have emerged to tackle this long-standing problem theoretically and empirically. In this tutorial, we summarize existing solutions, especially recent advancements from both database (DB) and machine learning (ML) communities, and analyze their strengths and weaknesses. We review exact and approximate methods such as cover tree, locality sensitive hashing, product quantization, and proximity graphs. We also discuss the selectivity estimation problem and show how researchers are bringing in state-of-the-art ML techniques to address the problem. By highlighting the strong connections between DB and ML, we hope that this tutorial provides an impetus towards new ML for DB solutions and vice versa.

引用

页码：3437 / 3440

页数：4

共 50 条

[31] Parallel continuous skyline query over high-dimensional data stream windows
Khames, Walid
Hadjali, Allel
Lagha, Mohand
[J]. DISTRIBUTED AND PARALLEL DATABASES, 2024, 42 (04) : 469 - 524
[32] High-dimensional data
Amaratunga, Dhammika
Cabrera, Javier
[J]. JOURNAL OF THE NATIONAL SCIENCE FOUNDATION OF SRI LANKA, 2016, 44 (01): : 3 - 9
[33] High-dimensional data
Geubbelmans, Melvin
Rousseau, Axel-Jan
Valkenborg, Dirk
Burzykowski, Tomasz
[J]. AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2023, 164 (03) : 453 - 456
[34] On-The-Fly Processing of continuous high-dimensional data streams
Vitale, Raffaele
Zhyrova, Anna
Fortuna, Joao F.
de Noord, Onno E.
Ferrer, Alberto
Martens, Harald
[J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2017, 161 : 118 - 129
[35] Iterative algorithms for the post-processing of high-dimensional data
Espig, Mike
Hackbusch, Wolfgang
Litvinenko, Alexander
Matthies, Hermann G.
Zander, Elmar
[J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2020, 410
[36] Adaptive quantization of the high-dimensional data for efficient KNN processing
Cui, B
Hu, J
Shen, HT
Yu, C
[J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2004, 2973 : 302 - 313
[37] Iterative algorithms for the post-processing of high-dimensional data
Espig, Mike
Hackbusch, Wolfgang
Litvinenko, Alexander
Matthies, Hermann G.
Zander, Elmar
[J]. Journal of Computational Physics, 2020, 410
[38] High-dimensional similarity retrieval using dimensional choice
Tahmoush, Dave
Samet, Hanan
[J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1 AND 2, 2008, : 490 - 497
[39] High-dimensional similarity retrieval using dimensional choice
Tahmoush, Dave
Samet, Hanan
[J]. SISAP 2008: FIRST INTERNATIONAL WORKSHOP ON SIMILARITY SEARCH AND APPLICATIONS, PROCEEDINGS, 2008, : 35 - 42
[40] Haery: A Hadoop Based Query System on Accumulative and High-Dimensional Data Model for Big Data
Song, Jie
He, HongYan
Thomas, Richard
Bao, Yubin
Yu, Ge
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (07) : 1362 - 1377

← 1 2 3 4 5 →