Indexing expensive functions for efficient multi-dimensional similarity search

被引:6
|
作者
Chen, Hanxiong [1 ]
Liu, Jianquan [1 ]
Furuse, Kazutaka [1 ]
Yu, Jeffrey Xu [2 ]
Ohbo, Nobuo [1 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Dept Comp Sci, Tsukuba, Ibaraki 3058577, Japan
[2] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Sha Tin, Hong Kong, Peoples R China
关键词
Similarity search; High-dimensional space; Function index; NEAREST-NEIGHBOR SEARCH; QUERIES; TREE;
D O I
10.1007/s10115-010-0303-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarity search is important in information retrieval applications where objects are usually represented as vectors of high dimensionality. This leads to the increasing need for supporting the indexing of high-dimensional data. On the other hand, indexing structures based on space partitioning are powerless because of the well-known "curse of dimensionality". Linear scan of the data with approximation is more efficient in the high-dimensional similarity search. However, approaches so far have concentrated on reducing I/O, and ignored the computation cost. For an expensive distance function such as L (p) norm with fractional p, the computation cost becomes the bottleneck. We propose a new technique to address expensive distance functions by "indexing the function" by pre-computing some key values of the function once. Then, the values are used to develop the upper/lower bounds of the distance between a data vector and the query vector. The technique is extremely efficient since it avoids most of the distance function computations; moreover, it does not involve any extra secondary storage because no index is constructed and stored. The efficiency is confirmed by cost analysis, as well as experiments on synthetic and real data.
引用
收藏
页码:165 / 192
页数:28
相关论文
共 50 条
  • [31] Spatial indexing and visualization of large multi-dimensional databases
    Dobos, Laszlo
    Csabai, Istvan
    Trencseni, Marton
    Herczegh, Geza
    Jozsa, Peter
    Purger, Norber
    [J]. ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XVI, 2007, 376 : 629 - +
  • [32] Fast and Adaptive Indexing of Multi-Dimensional Observational Data
    Wang, Sheng
    Maier, David
    Ooi, Beng Chin
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (14): : 1683 - 1694
  • [33] An efficient similarity search based on indexing in large DNA databases
    Jeong, In-Seon
    Park, Kyoung-Wook
    Kang, Seung-Ho
    Lim, Hyeong-Seok
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2010, 34 (02) : 131 - 136
  • [34] Hierarchical indexing structure for efficient similarity search in video retrieval
    Lu, Hong
    Ooi, Beng Chin
    Shen, Heng Tao
    Xue, Xiangyang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (11) : 1544 - 1559
  • [35] Indexing Dense Nested Metric Spaces for Efficient Similarity Search
    Brisaboa, Nieves R.
    Luaces, Miguel R.
    Pedreira, Oscar
    Places, Angeles S.
    Seco, Diego
    [J]. PERSPECTIVES OF SYSTEMS INFORMATICS, 2010, 5947 : 98 - 109
  • [36] Indexing of Motion Capture Data for Efficient and Fast Similarity Search
    Li, Chuanjun
    Prabhakaran, B.
    [J]. JOURNAL OF COMPUTERS, 2006, 1 (03) : 35 - 42
  • [37] Fast similarity join for multi-dimensional data
    Kalashnikov, Dmitri V.
    Prabhakar, Sunil
    [J]. INFORMATION SYSTEMS, 2007, 32 (01) : 160 - 177
  • [38] SPY-TEC: An efficient indexing method for similarity search in high-dimensional data spaces
    Lee, DH
    Kim, HJ
    [J]. DATA & KNOWLEDGE ENGINEERING, 2000, 34 (01) : 77 - 97
  • [39] Multi Feature Indexing Network MUFIN for Similarity Search Applications
    Zezula, Pavel
    [J]. SOFSEM 2012: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2012, 7147 : 77 - 87
  • [40] Design of Multi-Dimensional Search Queries for Efficient Discovery of Suppliers in the Smart Grid
    Yamashita, Akira
    Kutsuzawa, Ryo
    Takemura, Naoya
    Matsumoto, Jun
    Yamanaka, Naoaki
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON SMART GRID COMMUNICATIONS (SMARTGRIDCOMM), 2016,