Near neighbor searching with K nearest references

被引：18

作者：

Chavez, E. ^{[1
]}

Graff, M. ^{[3
]}

Navarro, G. ^{[2
]}

Tellez, E. S. ^{[3
]}

机构：

[1] CICESE, Mexico City, DF, Mexico

[2] Univ Chile, Dept Comp Sci, CeBiB Ctr Biotechnol & Bioengn, Santiago, Chile

[3] INFOTEC Catedra CONACyT, Mexico City, DF, Mexico

来源：

INFORMATION SYSTEMS | 2015年 / 51卷

关键词：

Proximity search; Searching by content in multimedia databases; k nearest neighbors; Indexing metric spaces; SIMILARITY SEARCH; APPROXIMATE; INDEX; REPRESENTATIONS; ALGORITHM; SPACES;

D O I：

10.1016/j.is.2015.02.001

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Proximity searching is the problem of retrieving, from a given database, those objects closest to a query. To avoid exhaustive searching, data structures called indexes are built on the database prior to serving queries. The curse of dimensionality is a well-known problem for indexes: in spaces with sufficiently concentrated distance histograms, no index outperforms an exhaustive scan of the database. In recent years, a number of indexes for approximate proximity searching have been proposed. These are able to cope with the curse of dimensionality in exchange for returning an answer that might be slightly different from the correct one. In this paper we show that many of those-recent indexes can be understood as variants of a simple general model based on K-nearest reference signatures. A set of references is chosen from the database, and the signature of each object consists of the K references nearest to the object. At query time, the signature of the query is computed and the search examines only the objects whose signature is close enough to that of the query. Many known and novel indexes are obtained by considering different ways to determine how much detail the signature records (e.g., just the set of nearest references, or also their proximity order to the object, or also their distances to the object, and so on), how the similarity between signatures is defined, and how the parameters are tuned. In addition, we introduce a space-efficient representation for those families of indexes, making it possible to search very large databases in main memory. Small indexes are cache friendly, inducing faster queries. We perform exhaustive experiments comparing several known and new indexes that derive from our framework, evaluating their time performance, memory usage, and quality of approximation. The best indexes outperform the state of the art, offering an attractive balance between all these aspects, and turn out to be excellent choices in many scenarios. Our framework gives high flexibility to design new indexes. (C) 2015 Elsevier Ltd. All rights reserved.

引用

页码：43 / 61

页数：19

共 50 条

[41] Fully Retroactive Approximate Range and Nearest Neighbor Searching
Goodrich, Michael T.
Simons, Joseph A.
ALGORITHMS AND COMPUTATION, 2011, 7074 : 292 - 301
[42] A Bidirectional Searching Strategy to Improve Data Quality Based on K-Nearest Neighbor Approach
Ma, Minghui
Liang, Shidong
Qin, Yifei
SYMMETRY-BASEL, 2019, 11 (06):
[43] A γ dose distribution evaluation technique using the k-d tree for nearest neighbor searching
Yuan, Jiankui
Chen, Weimin
MEDICAL PHYSICS, 2010, 37 (09) : 4868 - 4873
[44] Multiple k nearest neighbor search
Yu-Chi Chung
I-Fang Su
Chiang Lee
Pei-Chi Liu
World Wide Web, 2017, 20 : 371 - 398
[45] Approximate direct and reverse nearest neighbor queries, and the k-nearest neighbor graph
Figueroa, Karina
Paredes, Rodrigo
SISAP 2009: 2009 SECOND INTERNATIONAL WORKSHOP ON SIMILARITY SEARCH AND APPLICATIONS, PROCEEDINGS, 2009, : 91 - +
[46] Multiple k nearest neighbor search
Chung, Yu-Chi
Su, I-Fang
Lee, Chiang
Liu, Pei-Chi
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2017, 20 (02): : 371 - 398
[47] Quantum K nearest neighbor algorithm
Li, Qiang
Jiang, Jing-Ping
Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2008, 30 (05): : 940 - 943
[48] The Novel k Nearest Neighbor Algorithm
Jivani, Anjali Ganesh
2013 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS, 2013,
[49] GENERALIZED K NEAREST NEIGHBOR RULES
BEZDEK, JC
CHUAH, SK
LEEP, D
FUZZY SETS AND SYSTEMS, 1986, 18 (03) : 237 - 256
[50] Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification
Okfalisa
Mustakim
Gazalba, Ikbal
Reza, Nurul Gayatri Indah
2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 294 - 298

← 1 2 3 4 5 →