Near neighbor searching with K nearest references

被引:18
|
作者
Chavez, E. [1 ]
Graff, M. [3 ]
Navarro, G. [2 ]
Tellez, E. S. [3 ]
机构
[1] CICESE, Mexico City, DF, Mexico
[2] Univ Chile, Dept Comp Sci, CeBiB Ctr Biotechnol & Bioengn, Santiago, Chile
[3] INFOTEC Catedra CONACyT, Mexico City, DF, Mexico
关键词
Proximity search; Searching by content in multimedia databases; k nearest neighbors; Indexing metric spaces; SIMILARITY SEARCH; APPROXIMATE; INDEX; REPRESENTATIONS; ALGORITHM; SPACES;
D O I
10.1016/j.is.2015.02.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Proximity searching is the problem of retrieving, from a given database, those objects closest to a query. To avoid exhaustive searching, data structures called indexes are built on the database prior to serving queries. The curse of dimensionality is a well-known problem for indexes: in spaces with sufficiently concentrated distance histograms, no index outperforms an exhaustive scan of the database. In recent years, a number of indexes for approximate proximity searching have been proposed. These are able to cope with the curse of dimensionality in exchange for returning an answer that might be slightly different from the correct one. In this paper we show that many of those-recent indexes can be understood as variants of a simple general model based on K-nearest reference signatures. A set of references is chosen from the database, and the signature of each object consists of the K references nearest to the object. At query time, the signature of the query is computed and the search examines only the objects whose signature is close enough to that of the query. Many known and novel indexes are obtained by considering different ways to determine how much detail the signature records (e.g., just the set of nearest references, or also their proximity order to the object, or also their distances to the object, and so on), how the similarity between signatures is defined, and how the parameters are tuned. In addition, we introduce a space-efficient representation for those families of indexes, making it possible to search very large databases in main memory. Small indexes are cache friendly, inducing faster queries. We perform exhaustive experiments comparing several known and new indexes that derive from our framework, evaluating their time performance, memory usage, and quality of approximation. The best indexes outperform the state of the art, offering an attractive balance between all these aspects, and turn out to be excellent choices in many scenarios. Our framework gives high flexibility to design new indexes. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:43 / 61
页数:19
相关论文
共 50 条
  • [41] Fully Retroactive Approximate Range and Nearest Neighbor Searching
    Goodrich, Michael T.
    Simons, Joseph A.
    ALGORITHMS AND COMPUTATION, 2011, 7074 : 292 - 301
  • [42] A Bidirectional Searching Strategy to Improve Data Quality Based on K-Nearest Neighbor Approach
    Ma, Minghui
    Liang, Shidong
    Qin, Yifei
    SYMMETRY-BASEL, 2019, 11 (06):
  • [43] A γ dose distribution evaluation technique using the k-d tree for nearest neighbor searching
    Yuan, Jiankui
    Chen, Weimin
    MEDICAL PHYSICS, 2010, 37 (09) : 4868 - 4873
  • [44] Multiple k nearest neighbor search
    Yu-Chi Chung
    I-Fang Su
    Chiang Lee
    Pei-Chi Liu
    World Wide Web, 2017, 20 : 371 - 398
  • [45] Approximate direct and reverse nearest neighbor queries, and the k-nearest neighbor graph
    Figueroa, Karina
    Paredes, Rodrigo
    SISAP 2009: 2009 SECOND INTERNATIONAL WORKSHOP ON SIMILARITY SEARCH AND APPLICATIONS, PROCEEDINGS, 2009, : 91 - +
  • [46] Multiple k nearest neighbor search
    Chung, Yu-Chi
    Su, I-Fang
    Lee, Chiang
    Liu, Pei-Chi
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2017, 20 (02): : 371 - 398
  • [47] Quantum K nearest neighbor algorithm
    Li, Qiang
    Jiang, Jing-Ping
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2008, 30 (05): : 940 - 943
  • [48] The Novel k Nearest Neighbor Algorithm
    Jivani, Anjali Ganesh
    2013 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS, 2013,
  • [49] GENERALIZED K NEAREST NEIGHBOR RULES
    BEZDEK, JC
    CHUAH, SK
    LEEP, D
    FUZZY SETS AND SYSTEMS, 1986, 18 (03) : 237 - 256
  • [50] Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification
    Okfalisa
    Mustakim
    Gazalba, Ikbal
    Reza, Nurul Gayatri Indah
    2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 294 - 298