Entropy based Nearest Neighbor Search in High Dimensions

被引:116
|
作者
Panigrahy, Rina [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
D O I
10.1145/1109557.1109688
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points to the same value) to construct several hash tables to ensure that the query point hashes to the same bucket as its nearest neighbor in at least one table. Our approach is different we use one (or a few) hash table and hash several randomly chosen points in the neighborhood of the query point showing that at least one of them will hash to the bucket containing its nearest neighbor. We show that the number of randomly chosen points in the neighborhood of the query point q required depends on the entropy of the hash value h(p) of a random point p at the same distance from q at its nearest neighbor, given q and the locality preserving hash function h chosen randomly from the hash family. Precisely, we show that if the entropy I (h(p)vertical bar q, h) = M and g is a bound on the probability that two far-off points will hash to the same bucket, then we can find the approximate nearest neighbor in O(n(rho)) time and near linear (O) over tilde (n) space where p = M/log(l/g). Alternatively we can build a data structure of size O(n1/((1-rho)) to answer queries in 0(d) time. By applying this analysis to the locality preserving hash functions in [17, 21, 6] and adjusting the parameters we show that the c nearest neighbor can be computed in time O(nP) and near linear space where rho approximate to 2.06/c as c becomes large.
引用
收藏
页码:1186 / 1195
页数:10
相关论文
共 50 条
  • [31] Fast nearest neighbor search in high-dimensional space
    Berchtold, S
    Ertl, B
    Keim, DA
    Kriegel, HP
    Seidl, T
    14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1998, : 209 - 218
  • [32] High-Dimensional Nearest Neighbor Search-Based Blocking in Entity Resolution
    Zhang, Kaiyu
    Sun, Chenchen
    Shen, Derong
    Nie, Tiezheng
    Kou, Yue
    WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 215 - 226
  • [33] An efficient searching algorithm for approximate nearest neighbor queries in high dimensions
    Pramanik, S
    Alexander, S
    Li, JH
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 865 - 869
  • [34] Grid interpolation algorithm based on nearest neighbor fast search
    Huang, Hao
    Cui, Can
    Cheng, Liang
    Liu, Qiang
    Wang, Jiechen
    EARTH SCIENCE INFORMATICS, 2012, 5 (3-4) : 181 - 187
  • [35] Multiple k nearest neighbor search
    Yu-Chi Chung
    I-Fang Su
    Chiang Lee
    Pei-Chi Liu
    World Wide Web, 2017, 20 : 371 - 398
  • [36] Authenticated Multistep Nearest Neighbor Search
    Papadopoulos, Stavros
    Wang, Lixing
    Yang, Yin
    Papadias, Dimitris
    Karras, Panagiotis
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (05) : 641 - 654
  • [37] Fast Instance Search Based on Approximate Bichromatic Reverse Nearest Neighbor Search
    Iwamura, Masakazu
    Matozaki, Nobuaki
    Kise, Koichi
    PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 1121 - 1124
  • [38] Efficient Foreign Key Discovery Based on Nearest Neighbor Search
    Yuan, Xiaojie
    Cai, Xiangrui
    Yu, Man
    Wang, Chao
    Zhang, Ying
    Wen, Yanlong
    WEB-AGE INFORMATION MANAGEMENT (WAIM 2015), 2015, 9098 : 443 - 447
  • [39] A nearest neighbor search method for image matching based on ORB
    Institute of Image Processing and Pattern Recognition, North China University of Technology, Beijing, China
    J. Inf. Comput. Sci., 7 (2691-2700):
  • [40] Projection Search For Approximate Nearest Neighbor
    Feng, Cheng
    Yang, Bo
    2016 17TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2016, : 33 - 38