MSQL: efficient similarity search in metric spaces using SQL

被引:19
|
作者
Lu, Wei [1 ,2 ]
Hou, Jiajia [1 ,2 ]
Yan, Ying [4 ]
Zhang, Meihui [3 ]
Du, Xiaoyong [1 ,2 ]
Moscibroda, Thomas [5 ]
机构
[1] Renmin Univ China, MOE, DEKE, Beijing, Peoples R China
[2] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[3] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
[4] Microsoft Res, Beijing, Peoples R China
[5] Microsoft Azure, Redmond, WA USA
来源
VLDB JOURNAL | 2017年 / 26卷 / 06期
基金
中国国家自然科学基金;
关键词
Similarity search; Metric space; Query optimization; SQL-based; RDBMS; DATABASES; QUERIES; INDEX; TREES; JOINS;
D O I
10.1007/s00778-017-0481-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Similarity search is a primitive operation that arises in a large variety of database applications. Typical examples include identifying articles with similar titles, finding similar images and music in a large digital object repository, etc. While there exist a wide spectrum of access methods for similarity queries in metric spaces, a practical solution that can be fully supported by existing RDBMS with high efficiency still remains an open problem. In this paper, we present MSQL, a practical solution for answering similarity queries in metric spaces fully using SQL. To the best of our knowledge, MSQL enables users to find similar objects by submitting SELECT-FROM-WHERE statements only. MSQL provides a uniform indexing scheme based on a standard built-in -tree index, with the ability to accelerate the query processing using index seek. Various query optimization techniques are incorporated in MSQL to significantly reduce CPU and I/O cost. We deploy MSQL on top of PostgreSQL. Extensive experiments on various real data sets demonstrate MSQL's benefits, performing up to two orders of magnitude faster than existing domain-specific SQL-based solutions and being comparable to native solutions.
引用
收藏
页码:829 / 854
页数:26
相关论文
共 50 条
  • [1] MSQL: efficient similarity search in metric spaces using SQL
    Wei Lu
    Jiajia Hou
    Ying Yan
    Meihui Zhang
    Xiaoyong Du
    Thomas Moscibroda
    The VLDB Journal, 2017, 26 : 829 - 854
  • [2] MSQL+: A Plugin Toolkit for Similarity Search under Metric Spaces in Distributed Relational Database Systems
    Lu, Wei
    Zhang, Xinyi
    Shui, Zhiyu
    Peng, Zhe
    Zhang, Xiao
    Du, Xiaoyong
    Huang, Hao
    Wang, Xiaoyu
    Pang, Anqun
    Li, Haixiang
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (12): : 1970 - 1973
  • [3] Indexing Dense Nested Metric Spaces for Efficient Similarity Search
    Brisaboa, Nieves R.
    Luaces, Miguel R.
    Pedreira, Oscar
    Places, Angeles S.
    Seco, Diego
    PERSPECTIVES OF SYSTEMS INFORMATICS, 2010, 5947 : 98 - 109
  • [4] Approximate similarity search in metric spaces
    Yang, Hongli
    Journal of Computational Information Systems, 2010, 6 (06): : 1855 - 1862
  • [5] M-tree: An efficient access method for similarity search in metric spaces
    Ciaccia, P
    Patella, M
    Zezula, P
    PROCEEDINGS OF THE TWENTY-THIRD INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, 1997, : 426 - 435
  • [6] Efficient Metric Indexing for Similarity Search
    Chen, Lu
    Gao, Yunjun
    Li, Xinhan
    Jensen, Christian S.
    Chen, Gang
    2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 591 - 602
  • [7] An access structure for similarity search in metric spaces
    Dohnal, V
    CURRENT TRENDS IN DATABASE TECHNOLOGY - EDBT 2004 WORKSHOPS, PROCEEDINGS, 2004, 3268 : 133 - 143
  • [8] Indexing Metric Spaces for Exact Similarity Search
    Chen, Lu
    Gao, Yunjun
    Song, Xuan
    Li, Zheng
    Zhu, Yifan
    Miao, Xiaoye
    Jensen, Christian S.
    ACM COMPUTING SURVEYS, 2023, 55 (06)
  • [9] Efficient Metric Indexing for Similarity Search and Similarity Joins
    Chen, Lu
    Gao, Yunjun
    Li, Xinhan
    Jensen, Christian S.
    Chen, Gang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (03) : 556 - 571
  • [10] A Learned Index for Exact Similarity Search in Metric Spaces
    Tian, Yao
    Yan, Tingyun
    Zhao, Xi
    Huang, Kai
    Zhou, Xiaofang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (08) : 7624 - 7638