Near neighbor searching with K nearest references

被引:18
|
作者
Chavez, E. [1 ]
Graff, M. [3 ]
Navarro, G. [2 ]
Tellez, E. S. [3 ]
机构
[1] CICESE, Mexico City, DF, Mexico
[2] Univ Chile, Dept Comp Sci, CeBiB Ctr Biotechnol & Bioengn, Santiago, Chile
[3] INFOTEC Catedra CONACyT, Mexico City, DF, Mexico
关键词
Proximity search; Searching by content in multimedia databases; k nearest neighbors; Indexing metric spaces; SIMILARITY SEARCH; APPROXIMATE; INDEX; REPRESENTATIONS; ALGORITHM; SPACES;
D O I
10.1016/j.is.2015.02.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Proximity searching is the problem of retrieving, from a given database, those objects closest to a query. To avoid exhaustive searching, data structures called indexes are built on the database prior to serving queries. The curse of dimensionality is a well-known problem for indexes: in spaces with sufficiently concentrated distance histograms, no index outperforms an exhaustive scan of the database. In recent years, a number of indexes for approximate proximity searching have been proposed. These are able to cope with the curse of dimensionality in exchange for returning an answer that might be slightly different from the correct one. In this paper we show that many of those-recent indexes can be understood as variants of a simple general model based on K-nearest reference signatures. A set of references is chosen from the database, and the signature of each object consists of the K references nearest to the object. At query time, the signature of the query is computed and the search examines only the objects whose signature is close enough to that of the query. Many known and novel indexes are obtained by considering different ways to determine how much detail the signature records (e.g., just the set of nearest references, or also their proximity order to the object, or also their distances to the object, and so on), how the similarity between signatures is defined, and how the parameters are tuned. In addition, we introduce a space-efficient representation for those families of indexes, making it possible to search very large databases in main memory. Small indexes are cache friendly, inducing faster queries. We perform exhaustive experiments comparing several known and new indexes that derive from our framework, evaluating their time performance, memory usage, and quality of approximation. The best indexes outperform the state of the art, offering an attractive balance between all these aspects, and turn out to be excellent choices in many scenarios. Our framework gives high flexibility to design new indexes. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:43 / 61
页数:19
相关论文
共 50 条
  • [1] k-Nearest neighbor searching in hybrid spaces
    Kolbe, Dashiell
    Zhu, Qiang
    Pramanik, Sakti
    INFORMATION SYSTEMS, 2014, 43 : 55 - 64
  • [2] Fast k-Nearest Neighbor Searching in Static Objects
    Lee, Jae Moon
    WIRELESS PERSONAL COMMUNICATIONS, 2017, 93 (01) : 147 - 160
  • [3] Fast k-Nearest Neighbor Searching in Static Objects
    Jae Moon Lee
    Wireless Personal Communications, 2017, 93 : 147 - 160
  • [4] Searching k-Nearest Neighbor Trajectories on Road Networks
    Yuan, Pengcheng
    Zhao, Qinpei
    Rao, Weixiong
    Yuan, Mingxuan
    Zeng, Jia
    DATABASES THEORY AND APPLICATIONS, ADC 2017, 2017, 10538 : 85 - 97
  • [5] An efficient k nearest neighbor searching algorithm for a query line
    Nandy, SC
    COMPUTING AND COMBINATORICS, PROCEEDINGS, 2000, 1858 : 281 - 290
  • [6] REFINEMENTS TO NEAREST-NEIGHBOR SEARCHING IN K-DIMENSIONAL TREES
    SPROULL, RF
    ALGORITHMICA, 1991, 6 (04) : 579 - 589
  • [7] Refinements to nearest-neighbor searching in k-dimensional trees
    Sproull, Robert F.
    Algorithmica (New York), 1991, 6 (04): : 579 - 589
  • [8] An Efficient Method for k Nearest Neighbor Searching in Obstructed Spatial Databases
    Gu, Yu
    Yu, Ge
    Yu, Xiaonan
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2014, 30 (05) : 1569 - 1583
  • [9] Searching Nearest Neighbor In Overlay Network
    Chen, Tan
    Xiong, Xin
    2008 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1-3, 2008, : 88 - +
  • [10] High dimensional nearest neighbor searching
    Ferhatosmanoglu, Hakan
    Tuncel, Ertem
    Agrawal, Divyakant
    El Abbadi, Amr
    INFORMATION SYSTEMS, 2006, 31 (06) : 512 - 540