Clustering for approximate similarity search in high-dimensional spaces

被引:70
|
作者
Li, C [1 ]
Chang, E
Garcia-Molina, H
Wiederhold, G
机构
[1] Stanford Univ, Dept Comp Sci, Palo Alto, CA 94306 USA
[2] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA
关键词
approximate search; clustering; high-dimensional index; similarity search;
D O I
10.1109/TKDE.2002.1019214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a clustering and indexing paradigm (called Clindex) for high-dimensional search spaces. The scheme is designed for approximate similarity searches, where one would like to find many of the data points near a target point, but where one can tolerate missing a few near points. For such searches, our scheme can find near points with high recall in very few IOs and perform significantly better than other approaches. Our scheme is based on finding clusters and, then, building a simple but efficient index for them. We analyze the trade-offs involved in clustering and building such an index structure, and present extensive experimental results.
引用
收藏
页码:792 / 808
页数:17
相关论文
共 50 条
  • [1] CSVD: Clustering and Singular Value Decomposition for approximate similarity search in high-dimensional spaces
    Castelli, V
    Thomasian, A
    Li, CS
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (03) : 671 - 685
  • [2] Memory Vectors for Similarity Search in High-Dimensional Spaces
    Iscen, Ahmet
    Furon, Teddy
    Gripon, Vincent
    Rabbat, Michael
    Jegou, Herve
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (01) : 65 - 77
  • [3] Quantization techniques for similarity search in high-dimensional data spaces
    Garcia-Arellano, C
    Sevcik, K
    [J]. NEW HORIZONS IN INFORMATION MANAGEMENT, 2003, 2712 : 75 - 94
  • [4] A Group Testing Framework for Similarity Search in High-dimensional Spaces
    Shi, Miaojing
    Furon, Teddy
    Jegou, Herve
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 407 - 416
  • [5] Fast approximate similarity search in extremely high-dimensional data sets
    Houle, ME
    Sakuma, J
    [J]. ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 619 - 630
  • [6] Clustering in high-dimensional data spaces
    Murtagh, FD
    [J]. STATISTICAL CHALLENGES IN ASTRONOMY, 2003, : 279 - 292
  • [7] Fast similarity search for high-dimensional dataset
    Wang, Quan
    You, Suya
    [J]. ISM 2006: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2006, : 799 - +
  • [8] Shape indexing using approximate nearest-neighbour search in high-dimensional spaces
    Beis, JS
    Lowe, DG
    [J]. 1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, : 1000 - 1006
  • [9] Towards Efficient Index Construction and Approximate Nearest Neighbor Search in High-Dimensional Spaces
    Zhao, Xi
    Tian, Yao
    Huang, Kai
    Zheng, Bolong
    Zhou, Xiaofang
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (08): : 1979 - 1991
  • [10] Nearest Neighbor Search in High-Dimensional Spaces
    Andoni, Alexandr
    [J]. MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2011, 2011, 6907 : 1 - 1