MSQL: efficient similarity search in metric spaces using SQL

被引:19
|
作者
Lu, Wei [1 ,2 ]
Hou, Jiajia [1 ,2 ]
Yan, Ying [4 ]
Zhang, Meihui [3 ]
Du, Xiaoyong [1 ,2 ]
Moscibroda, Thomas [5 ]
机构
[1] Renmin Univ China, MOE, DEKE, Beijing, Peoples R China
[2] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[3] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
[4] Microsoft Res, Beijing, Peoples R China
[5] Microsoft Azure, Redmond, WA USA
来源
VLDB JOURNAL | 2017年 / 26卷 / 06期
基金
中国国家自然科学基金;
关键词
Similarity search; Metric space; Query optimization; SQL-based; RDBMS; DATABASES; QUERIES; INDEX; TREES; JOINS;
D O I
10.1007/s00778-017-0481-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Similarity search is a primitive operation that arises in a large variety of database applications. Typical examples include identifying articles with similar titles, finding similar images and music in a large digital object repository, etc. While there exist a wide spectrum of access methods for similarity queries in metric spaces, a practical solution that can be fully supported by existing RDBMS with high efficiency still remains an open problem. In this paper, we present MSQL, a practical solution for answering similarity queries in metric spaces fully using SQL. To the best of our knowledge, MSQL enables users to find similar objects by submitting SELECT-FROM-WHERE statements only. MSQL provides a uniform indexing scheme based on a standard built-in -tree index, with the ability to accelerate the query processing using index seek. Various query optimization techniques are incorporated in MSQL to significantly reduce CPU and I/O cost. We deploy MSQL on top of PostgreSQL. Extensive experiments on various real data sets demonstrate MSQL's benefits, performing up to two orders of magnitude faster than existing domain-specific SQL-based solutions and being comparable to native solutions.
引用
收藏
页码:829 / 854
页数:26
相关论文
共 50 条
  • [31] NM-tree: Flexible approximate similarity search in metric and non-metric spaces
    Skopal, Tomas
    Lokoc, Jakub
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2008, 5181 : 312 - 325
  • [32] The Duality of Similarity and Metric Spaces
    Rozinek, Ondrej
    Mares, Jan
    APPLIED SCIENCES-BASEL, 2021, 11 (04): : 1 - 18
  • [33] Similarity join in metric spaces
    Dohnal, V
    Gennaro, C
    Savino, P
    Zezula, P
    ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 452 - 467
  • [34] An efficient algorithm for approximated self-similarity joins in metric spaces
    Ferrada, Sebastian
    Bustos, Benjamin
    Reyes, Nora
    INFORMATION SYSTEMS, 2020, 91
  • [35] Clustering-based similarity search in metric spaces with sparse spatial centers
    Brisaboa, Nieves
    Pedreira, Oscar
    Seco, Diego
    Solar, Roberto
    Uribe, Roberto
    SOFSEM 2008: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2008, 4910 : 186 - +
  • [36] Efficient similarity search in nonmetric spaces with local constant embedding
    Chen, Lei
    Lian, Xiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (03) : 321 - 336
  • [37] Metric Trees for Efficient Similarity Search in Large Process Model Repositories
    Kunze, Matthias
    Weske, Mathias
    BUSINESS PROCESS MANAGEMENT WORKSHOPS, 2011, 66 : 535 - 546
  • [38] Metric Index: An efficient and scalable solution for precise and approximate similarity search
    Novak, David
    Batko, Michal
    Zezula, Pavel
    INFORMATION SYSTEMS, 2011, 36 (04) : 721 - 733
  • [39] Similarity join in metric spaces using eD-Index
    Dohnal, V
    Gennaro, C
    Zezula, P
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, 2736 : 484 - 493
  • [40] Using tuneable fuzzy similarity in non-metric search
    Vojtas, Peter
    Eckhardt, Alan
    SISAP 2009: 2009 SECOND INTERNATIONAL WORKSHOP ON SIMILARITY SEARCH AND APPLICATIONS, PROCEEDINGS, 2009, : 163 - 164