Near neighbor searching with K nearest references

被引:18
|
作者
Chavez, E. [1 ]
Graff, M. [3 ]
Navarro, G. [2 ]
Tellez, E. S. [3 ]
机构
[1] CICESE, Mexico City, DF, Mexico
[2] Univ Chile, Dept Comp Sci, CeBiB Ctr Biotechnol & Bioengn, Santiago, Chile
[3] INFOTEC Catedra CONACyT, Mexico City, DF, Mexico
关键词
Proximity search; Searching by content in multimedia databases; k nearest neighbors; Indexing metric spaces; SIMILARITY SEARCH; APPROXIMATE; INDEX; REPRESENTATIONS; ALGORITHM; SPACES;
D O I
10.1016/j.is.2015.02.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Proximity searching is the problem of retrieving, from a given database, those objects closest to a query. To avoid exhaustive searching, data structures called indexes are built on the database prior to serving queries. The curse of dimensionality is a well-known problem for indexes: in spaces with sufficiently concentrated distance histograms, no index outperforms an exhaustive scan of the database. In recent years, a number of indexes for approximate proximity searching have been proposed. These are able to cope with the curse of dimensionality in exchange for returning an answer that might be slightly different from the correct one. In this paper we show that many of those-recent indexes can be understood as variants of a simple general model based on K-nearest reference signatures. A set of references is chosen from the database, and the signature of each object consists of the K references nearest to the object. At query time, the signature of the query is computed and the search examines only the objects whose signature is close enough to that of the query. Many known and novel indexes are obtained by considering different ways to determine how much detail the signature records (e.g., just the set of nearest references, or also their proximity order to the object, or also their distances to the object, and so on), how the similarity between signatures is defined, and how the parameters are tuned. In addition, we introduce a space-efficient representation for those families of indexes, making it possible to search very large databases in main memory. Small indexes are cache friendly, inducing faster queries. We perform exhaustive experiments comparing several known and new indexes that derive from our framework, evaluating their time performance, memory usage, and quality of approximation. The best indexes outperform the state of the art, offering an attractive balance between all these aspects, and turn out to be excellent choices in many scenarios. Our framework gives high flexibility to design new indexes. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:43 / 61
页数:19
相关论文
共 50 条
  • [21] k Nearest Neighbor Classification Coprocessor with Weighted Clock-Mapping-Based Searching
    An, Fengwei
    Chen, Lei
    Akazawa, Toshinobu
    Yamasaki, Shogo
    Mattausch, Hans Jurgen
    IEICE TRANSACTIONS ON ELECTRONICS, 2016, E99C (03): : 397 - 403
  • [22] Algorithm for searching nearest-neighbor based on the bounded k-d tree
    College of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
    不详
    Huazhong Ligong Daxue Xuebao, 2008, 7 (73-76):
  • [23] Distance-Constraint k-Nearest Neighbor Searching in Mobile Sensor Networks
    Han, Yongkoo
    Park, Kisung
    Hong, Jihye
    Ulamin, Noor
    Lee, Young-Koo
    SENSORS, 2015, 15 (08) : 18209 - 18228
  • [24] On k-nearest neighbor searching in non-ordered discrete data spaces
    Kolbe, Dashiell
    Zhu, Qiang
    Pramanik, Sakti
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 401 - +
  • [25] Navigating K-Nearest Neighbor Graphs to Solve Nearest Neighbor Searches
    Chavez, Edgar
    Sadit Tellez, Eric
    ADVANCES IN PATTERN RECOGNITION, 2010, 6256 : 270 - 280
  • [26] Complexity analysis for partitioning nearest neighbor searching algorithms
    Zakarauskas, P
    Ozard, JM
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1996, 18 (06) : 663 - 668
  • [27] Hit-directed nearest neighbor searching.
    Shanmugasundaram, V
    Maggiora, GM
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2004, 227 : U688 - U688
  • [28] On the Most Likely Voronoi Diagram and Nearest Neighbor Searching
    Suri, Subhash
    Verbeek, Kevin
    ALGORITHMS AND COMPUTATION, ISAAC 2014, 2014, 8889 : 338 - 350
  • [29] Fuzzy Monotonic K-Nearest Neighbor Versus Monotonic Fuzzy K-Nearest Neighbor
    Zhu, Hong
    Wang, Xizhao
    Wang, Ran
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (09) : 3501 - 3513
  • [30] Accounting for boundary effects in nearest-neighbor searching
    Arya, S
    Mount, DM
    Narayan, O
    DISCRETE & COMPUTATIONAL GEOMETRY, 1996, 16 (02) : 155 - 176