Similarity Query Processing for High-Dimensional Data

被引:8
|
作者
Qin, Jianbin [1 ]
Wang, Wei [2 ]
Xiao, Chuan [3 ,4 ]
Zhang, Ying [5 ]
机构
[1] Shenzhen Univ, Shenzhen Inst Comp Sci, Shenzhen, Guangdong, Peoples R China
[2] Univ New South Wales, Sydney, NSW, Australia
[3] Osaka Univ, Suita, Osaka, Japan
[4] Nagoya Univ, Nagoya, Aichi, Japan
[5] Univ Technol Sydney, Sydney, NSW, Australia
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2020年 / 13卷 / 12期
关键词
NEAREST-NEIGHBOR SEARCH; SMALL WORLD; ALGORITHM; SPACE; LSH;
D O I
10.14778/3415478.3415564
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Similarity query processing has been an active research topic for several decades. It is an essential procedure in a wide range of applications. Recently, embedding and auto-encoding methods as well as pre-trained models have gained popularity. They basically deal with high-dimensional data, and this trend brings new opportunities and challenges to similarity query processing for high-dimensional data. Meanwhile, new techniques have emerged to tackle this long-standing problem theoretically and empirically. In this tutorial, we summarize existing solutions, especially recent advancements from both database (DB) and machine learning (ML) communities, and analyze their strengths and weaknesses. We review exact and approximate methods such as cover tree, locality sensitive hashing, product quantization, and proximity graphs. We also discuss the selectivity estimation problem and show how researchers are bringing in state-of-the-art ML techniques to address the problem. By highlighting the strong connections between DB and ML, we hope that this tutorial provides an impetus towards new ML for DB solutions and vice versa.
引用
收藏
页码:3437 / 3440
页数:4
相关论文
共 50 条
  • [31] Parallel continuous skyline query over high-dimensional data stream windows
    Khames, Walid
    Hadjali, Allel
    Lagha, Mohand
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2024, 42 (04) : 469 - 524
  • [32] High-dimensional data
    Amaratunga, Dhammika
    Cabrera, Javier
    [J]. JOURNAL OF THE NATIONAL SCIENCE FOUNDATION OF SRI LANKA, 2016, 44 (01): : 3 - 9
  • [33] High-dimensional data
    Geubbelmans, Melvin
    Rousseau, Axel-Jan
    Valkenborg, Dirk
    Burzykowski, Tomasz
    [J]. AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2023, 164 (03) : 453 - 456
  • [34] On-The-Fly Processing of continuous high-dimensional data streams
    Vitale, Raffaele
    Zhyrova, Anna
    Fortuna, Joao F.
    de Noord, Onno E.
    Ferrer, Alberto
    Martens, Harald
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2017, 161 : 118 - 129
  • [35] Iterative algorithms for the post-processing of high-dimensional data
    Espig, Mike
    Hackbusch, Wolfgang
    Litvinenko, Alexander
    Matthies, Hermann G.
    Zander, Elmar
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2020, 410
  • [36] Adaptive quantization of the high-dimensional data for efficient KNN processing
    Cui, B
    Hu, J
    Shen, HT
    Yu, C
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2004, 2973 : 302 - 313
  • [37] Iterative algorithms for the post-processing of high-dimensional data
    Espig, Mike
    Hackbusch, Wolfgang
    Litvinenko, Alexander
    Matthies, Hermann G.
    Zander, Elmar
    [J]. Journal of Computational Physics, 2020, 410
  • [38] High-dimensional similarity retrieval using dimensional choice
    Tahmoush, Dave
    Samet, Hanan
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1 AND 2, 2008, : 490 - 497
  • [39] High-dimensional similarity retrieval using dimensional choice
    Tahmoush, Dave
    Samet, Hanan
    [J]. SISAP 2008: FIRST INTERNATIONAL WORKSHOP ON SIMILARITY SEARCH AND APPLICATIONS, PROCEEDINGS, 2008, : 35 - 42
  • [40] Haery: A Hadoop Based Query System on Accumulative and High-Dimensional Data Model for Big Data
    Song, Jie
    He, HongYan
    Thomas, Richard
    Bao, Yubin
    Yu, Ge
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (07) : 1362 - 1377