Similarity Query Processing for High-Dimensional Data

被引:8
|
作者
Qin, Jianbin [1 ]
Wang, Wei [2 ]
Xiao, Chuan [3 ,4 ]
Zhang, Ying [5 ]
机构
[1] Shenzhen Univ, Shenzhen Inst Comp Sci, Shenzhen, Guangdong, Peoples R China
[2] Univ New South Wales, Sydney, NSW, Australia
[3] Osaka Univ, Suita, Osaka, Japan
[4] Nagoya Univ, Nagoya, Aichi, Japan
[5] Univ Technol Sydney, Sydney, NSW, Australia
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2020年 / 13卷 / 12期
关键词
NEAREST-NEIGHBOR SEARCH; SMALL WORLD; ALGORITHM; SPACE; LSH;
D O I
10.14778/3415478.3415564
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Similarity query processing has been an active research topic for several decades. It is an essential procedure in a wide range of applications. Recently, embedding and auto-encoding methods as well as pre-trained models have gained popularity. They basically deal with high-dimensional data, and this trend brings new opportunities and challenges to similarity query processing for high-dimensional data. Meanwhile, new techniques have emerged to tackle this long-standing problem theoretically and empirically. In this tutorial, we summarize existing solutions, especially recent advancements from both database (DB) and machine learning (ML) communities, and analyze their strengths and weaknesses. We review exact and approximate methods such as cover tree, locality sensitive hashing, product quantization, and proximity graphs. We also discuss the selectivity estimation problem and show how researchers are bringing in state-of-the-art ML techniques to address the problem. By highlighting the strong connections between DB and ML, we hope that this tutorial provides an impetus towards new ML for DB solutions and vice versa.
引用
收藏
页码:3437 / 3440
页数:4
相关论文
共 50 条
  • [1] High-Dimensional Similarity Query Processing for Data Science
    Qin, Jianbin
    Wang, Wei
    Xiao, Chuan
    Zhang, Ying
    Wang, Yaoshu
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4062 - 4063
  • [2] Efficient Parallel Skyline Query Processing for High-Dimensional Data
    Tang, Mingjie
    Yu, Yongyang
    Aref, Walid G.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 2113 - 2114
  • [3] PROM: Efficient matching query processing on high-dimensional data
    Ma, Chunyang
    Zhou, Yongluan
    Shou, Lidan
    Chen, Gang
    [J]. INFORMATION SCIENCES, 2015, 322 : 1 - 19
  • [4] Efficient Parallel Skyline Query Processing for High-Dimensional Data
    Tang, Mingjie
    Yu, Yongyang
    Aref, Walid G.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (10) : 1838 - 1851
  • [5] An efficient algorithm for hyperspherical range query processing in high-dimensional data space
    Lee, DH
    Heu, S
    Kim, HJ
    [J]. INFORMATION PROCESSING LETTERS, 2002, 83 (02) : 115 - 123
  • [6] MUD: Mapping-based query processing for high-dimensional uncertain data
    Shou, Lidan
    Zhang, Xiaolong
    Chen, Gang
    Gao, Yuan
    Chen, Ke
    [J]. INFORMATION SCIENCES, 2012, 198 : 147 - 168
  • [7] A novel approach for high-dimensional vector similarity join query
    Ma, Youzhong
    Jia, Shijie
    Zhang, Yongxin
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (05):
  • [8] qwLSH: Cache-conscious Indexing for Processing Similarity Search Query Workloads in High-Dimensional Spaces
    Jafari, Omid
    Ossorgin, John
    Nagarkar, Parth
    [J]. ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 329 - 333
  • [9] Similarity Learning for High-Dimensional Sparse Data
    Liu, Kuan
    Bellet, Aurelien
    Sha, Fei
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 653 - 662
  • [10] High-Dimensional Similarity Search for Scalable Data Science
    Echihabi, Karima
    Zoumpatianos, Kostas
    Palpanas, Themis
    [J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2369 - 2372