Similarity Query Processing for High-Dimensional Data

被引：8

作者：

Qin, Jianbin ^{[1
]}

Wang, Wei ^{[2
]}

Xiao, Chuan ^{[3
,4
]}

Zhang, Ying ^{[5
]}

机构：

[1] Shenzhen Univ, Shenzhen Inst Comp Sci, Shenzhen, Guangdong, Peoples R China

[2] Univ New South Wales, Sydney, NSW, Australia

[3] Osaka Univ, Suita, Osaka, Japan

[4] Nagoya Univ, Nagoya, Aichi, Japan

[5] Univ Technol Sydney, Sydney, NSW, Australia

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2020年 / 13卷 / 12期

关键词：

NEAREST-NEIGHBOR SEARCH; SMALL WORLD; ALGORITHM; SPACE; LSH;

D O I：

10.14778/3415478.3415564

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Similarity query processing has been an active research topic for several decades. It is an essential procedure in a wide range of applications. Recently, embedding and auto-encoding methods as well as pre-trained models have gained popularity. They basically deal with high-dimensional data, and this trend brings new opportunities and challenges to similarity query processing for high-dimensional data. Meanwhile, new techniques have emerged to tackle this long-standing problem theoretically and empirically. In this tutorial, we summarize existing solutions, especially recent advancements from both database (DB) and machine learning (ML) communities, and analyze their strengths and weaknesses. We review exact and approximate methods such as cover tree, locality sensitive hashing, product quantization, and proximity graphs. We also discuss the selectivity estimation problem and show how researchers are bringing in state-of-the-art ML techniques to address the problem. By highlighting the strong connections between DB and ML, we hope that this tutorial provides an impetus towards new ML for DB solutions and vice versa.

引用

页码：3437 / 3440

页数：4

共 50 条

[1] High-Dimensional Similarity Query Processing for Data Science
Qin, Jianbin
Wang, Wei
Xiao, Chuan
Zhang, Ying
Wang, Yaoshu
[J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4062 - 4063
[2] Efficient Parallel Skyline Query Processing for High-Dimensional Data
Tang, Mingjie
Yu, Yongyang
Aref, Walid G.
Malluhi, Qutaibah M.
Ouzzani, Mourad
[J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 2113 - 2114
[3] PROM: Efficient matching query processing on high-dimensional data
Ma, Chunyang
Zhou, Yongluan
Shou, Lidan
Chen, Gang
[J]. INFORMATION SCIENCES, 2015, 322 : 1 - 19
[4] Efficient Parallel Skyline Query Processing for High-Dimensional Data
Tang, Mingjie
Yu, Yongyang
Aref, Walid G.
Malluhi, Qutaibah M.
Ouzzani, Mourad
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (10) : 1838 - 1851
[5] An efficient algorithm for hyperspherical range query processing in high-dimensional data space
Lee, DH
Heu, S
Kim, HJ
[J]. INFORMATION PROCESSING LETTERS, 2002, 83 (02) : 115 - 123
[6] MUD: Mapping-based query processing for high-dimensional uncertain data
Shou, Lidan
Zhang, Xiaolong
Chen, Gang
Gao, Yuan
Chen, Ke
[J]. INFORMATION SCIENCES, 2012, 198 : 147 - 168
[7] A novel approach for high-dimensional vector similarity join query
Ma, Youzhong
Jia, Shijie
Zhang, Yongxin
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (05):
[8] qwLSH: Cache-conscious Indexing for Processing Similarity Search Query Workloads in High-Dimensional Spaces
Jafari, Omid
Ossorgin, John
Nagarkar, Parth
[J]. ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 329 - 333
[9] Similarity Learning for High-Dimensional Sparse Data
Liu, Kuan
Bellet, Aurelien
Sha, Fei
[J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 653 - 662
[10] High-Dimensional Similarity Search for Scalable Data Science
Echihabi, Karima
Zoumpatianos, Kostas
Palpanas, Themis
[J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2369 - 2372

← 1 2 3 4 5 →