CSVD: Clustering and Singular Value Decomposition for approximate similarity search in high-dimensional spaces

被引:45
|
作者
Castelli, V
Thomasian, A
Li, CS
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
关键词
multidimensional indexing; singular value decomposition; clustering; multimedia indexing; curse of dimensionality; principal component analysis;
D O I
10.1109/TKDE.2003.1198398
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nearest-neighbor search of high-dimensionality spaces is critical for,many applications, such as content-based retrieval from multimedia databases, similarity search of patterns in data mining, and nearest-heighbor classification. Unfortunately, even with the aid of the commonly used indexing schemes, the performance of nearest-neighbor (NN) queries deteriorates rapidly with the number of dimensions. We propose a method, called Clustering with Singular Value Decomposition (CSVD), which supports efficient approximate processing of NN queries, while maintaining good precision-recall characteristics. CSVD groups homogeneous points into clusters and separately reduces the dimensionality of each cluster using SVD. Cluster selection for NN queries relies on a branch-and-bound algorithm and within-cluster searches can be performed with traditional or in-memory indexing methods. Experiments with texture vectors extracted from satellite images show that CSVD achieves significantly higher dimensionality reduction than plain SVD for the same Normalized Mean Squared Error (NMSE), which translates into a higher efficiency in processing approximate NN queries.
引用
收藏
页码:671 / 685
页数:15
相关论文
共 50 条
  • [1] CSVD: Approximate similarity searches in high dimensional spaces using clustering and singular value decomposition
    Thomasian, A
    Castelli, V
    Li, CS
    [J]. MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS III, 1998, 3527 : 144 - 154
  • [2] Clustering for approximate similarity search in high-dimensional spaces
    Li, C
    Chang, E
    Garcia-Molina, H
    Wiederhold, G
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (04) : 792 - 808
  • [3] Memory Vectors for Similarity Search in High-Dimensional Spaces
    Iscen, Ahmet
    Furon, Teddy
    Gripon, Vincent
    Rabbat, Michael
    Jegou, Herve
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (01) : 65 - 77
  • [4] Federated singular value decomposition for high-dimensional data
    Hartebrodt, Anne
    Rottger, Richard
    Blumenthal, David B.
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 38 (03) : 938 - 975
  • [5] Federated singular value decomposition for high-dimensional data
    Anne Hartebrodt
    Richard Röttger
    David B. Blumenthal
    [J]. Data Mining and Knowledge Discovery, 2024, 38 : 938 - 975
  • [6] Quantization techniques for similarity search in high-dimensional data spaces
    Garcia-Arellano, C
    Sevcik, K
    [J]. NEW HORIZONS IN INFORMATION MANAGEMENT, 2003, 2712 : 75 - 94
  • [7] A Group Testing Framework for Similarity Search in High-dimensional Spaces
    Shi, Miaojing
    Furon, Teddy
    Jegou, Herve
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 407 - 416
  • [8] Fast approximate similarity search in extremely high-dimensional data sets
    Houle, ME
    Sakuma, J
    [J]. ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 619 - 630
  • [9] A Sparse Singular Value Decomposition Method for High-Dimensional Data
    Yang, Dan
    Ma, Zongming
    Buja, Andreas
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (04) : 923 - 942
  • [10] High-Dimensional Generalized Orthogonal Matching Pursuit With Singular Value Decomposition
    Zong, Zhaoyun
    Fu, Ting
    Yin, Xingyao
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20