Indexing expensive functions for efficient multi-dimensional similarity search

被引:6
|
作者
Chen, Hanxiong [1 ]
Liu, Jianquan [1 ]
Furuse, Kazutaka [1 ]
Yu, Jeffrey Xu [2 ]
Ohbo, Nobuo [1 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Dept Comp Sci, Tsukuba, Ibaraki 3058577, Japan
[2] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Sha Tin, Hong Kong, Peoples R China
关键词
Similarity search; High-dimensional space; Function index; NEAREST-NEIGHBOR SEARCH; QUERIES; TREE;
D O I
10.1007/s10115-010-0303-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarity search is important in information retrieval applications where objects are usually represented as vectors of high dimensionality. This leads to the increasing need for supporting the indexing of high-dimensional data. On the other hand, indexing structures based on space partitioning are powerless because of the well-known "curse of dimensionality". Linear scan of the data with approximation is more efficient in the high-dimensional similarity search. However, approaches so far have concentrated on reducing I/O, and ignored the computation cost. For an expensive distance function such as L (p) norm with fractional p, the computation cost becomes the bottleneck. We propose a new technique to address expensive distance functions by "indexing the function" by pre-computing some key values of the function once. Then, the values are used to develop the upper/lower bounds of the distance between a data vector and the query vector. The technique is extremely efficient since it avoids most of the distance function computations; moreover, it does not involve any extra secondary storage because no index is constructed and stored. The efficiency is confirmed by cost analysis, as well as experiments on synthetic and real data.
引用
收藏
页码:165 / 192
页数:28
相关论文
共 50 条
  • [1] Indexing expensive functions for efficient multi-dimensional similarity search
    Hanxiong Chen
    Jianquan Liu
    Kazutaka Furuse
    Jeffrey Xu Yu
    Nobuo Ohbo
    [J]. Knowledge and Information Systems, 2011, 27 : 165 - 192
  • [2] Indexing the Function: An Efficient Algorithm for Multi-dimensional Search with Expensive Distance Functions
    Chen, Hanxiong
    Liu, Jianquan
    Furuse, Kazutaka
    Yu, Jeffrey Xu
    Ohbo, Nobuo
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 67 - +
  • [3] EFFICIENT SIMILARITY SEARCH FOR MULTI-DIMENSIONAL TIME SEQUENCES
    Lee, Sangjun
    Park, Jisook
    [J]. INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2010, 8 (03) : 343 - 357
  • [4] An efficient indexing structure for multi-dimensional range query
    Shanshan Chen
    Guiping Zhou
    Xingdi An
    [J]. Frontiers of Computer Science, 2021, 15
  • [5] An efficient indexing scheme for multi-dimensional moving objects
    Elbassioni, K
    Elmasry, A
    Kamel, I
    [J]. DATABASE THEORY ICDT 2003, PROCEEDINGS, 2003, 2572 : 425 - 439
  • [6] An efficient indexing structure for multi-dimensional range query
    Chen, Shanshan
    Zhou, Guiping
    An, Xingdi
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2021, 15 (04)
  • [7] An efficient indexing structure for multi-dimensional range query
    Shanshan CHEN
    Guiping ZHOU
    Xingdi AN
    [J]. Frontiers of Computer Science., 2021, (04) - 173
  • [8] Efficient Metric Indexing for Similarity Search
    Chen, Lu
    Gao, Yunjun
    Li, Xinhan
    Jensen, Christian S.
    Chen, Gang
    [J]. 2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 591 - 602
  • [9] Similarity Search Problem Research on Multi-dimensional Data Sets
    Shi, Yong
    Graham, Brian
    [J]. PROCEEDINGS OF THE 2013 10TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2013, : 573 - 577
  • [10] Efficient Metric Indexing for Similarity Search and Similarity Joins
    Chen, Lu
    Gao, Yunjun
    Li, Xinhan
    Jensen, Christian S.
    Chen, Gang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (03) : 556 - 571