Entropy based Nearest Neighbor Search in High Dimensions

被引:116
|
作者
Panigrahy, Rina [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
D O I
10.1145/1109557.1109688
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points to the same value) to construct several hash tables to ensure that the query point hashes to the same bucket as its nearest neighbor in at least one table. Our approach is different we use one (or a few) hash table and hash several randomly chosen points in the neighborhood of the query point showing that at least one of them will hash to the bucket containing its nearest neighbor. We show that the number of randomly chosen points in the neighborhood of the query point q required depends on the entropy of the hash value h(p) of a random point p at the same distance from q at its nearest neighbor, given q and the locality preserving hash function h chosen randomly from the hash family. Precisely, we show that if the entropy I (h(p)vertical bar q, h) = M and g is a bound on the probability that two far-off points will hash to the same bucket, then we can find the approximate nearest neighbor in O(n(rho)) time and near linear (O) over tilde (n) space where p = M/log(l/g). Alternatively we can build a data structure of size O(n1/((1-rho)) to answer queries in 0(d) time. By applying this analysis to the locality preserving hash functions in [17, 21, 6] and adjusting the parameters we show that the c nearest neighbor can be computed in time O(nP) and near linear space where rho approximate to 2.06/c as c becomes large.
引用
收藏
页码:1186 / 1195
页数:10
相关论文
共 50 条
  • [1] A simple algorithm for nearest neighbor search in high dimensions
    Nene, SA
    Nayar, SK
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (09) : 989 - 1003
  • [2] Randomized Algorithm for Approximate Nearest Neighbor Search in High Dimensions
    Buabal, Ruben
    Homaifarl, Abdollah
    Hendrix, William
    Son, Seung Woo
    Liao, Wei-keng
    Choudhary, Alok
    JOURNAL OF PATTERN RECOGNITION RESEARCH, 2014, 9 (01): : 111 - 122
  • [3] PARALLEL ALGORITHMS FOR NEAREST NEIGHBOR SEARCH PROBLEMS IN HIGH DIMENSIONS
    Xiao, Bo
    Biros, George
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2016, 38 (05): : S667 - S699
  • [4] Probably correct k-nearest neighbor search in high dimensions
    Toyama, Jun
    Kudo, Mineichi
    Imai, Hideyuki
    PATTERN RECOGNITION, 2010, 43 (04) : 1361 - 1372
  • [5] Fast approximate search algorithm for nearest neighbor queries in high dimensions
    Pramanik, S
    Li, JH
    15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, : 251 - 251
  • [6] Transform Coding for Fast Approximate Nearest Neighbor Search in High Dimensions
    Brandt, Jonathan
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 1815 - 1822
  • [7] Approximate Line Nearest Neighbor in High Dimensions
    Andoni, Alexandr
    Indyk, Piotr
    Krauthgamer, Robert
    Nguyen, Huy L.
    PROCEEDINGS OF THE TWENTIETH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2009, : 293 - +
  • [8] DDS: An efficient dynamic dimension selection algorithm for nearest neighbor search in high dimensions
    Kuo, CC
    Chen, MS
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 999 - 1002
  • [9] Approximate all nearest neighbor search for high dimensional entropy estimation for image registration
    Kybic, Jan
    Vnucko, Ivan
    SIGNAL PROCESSING, 2012, 92 (05) : 1302 - 1316
  • [10] ON THE PERFORMANCE OF EDITED NEAREST NEIGHBOR RULES IN HIGH DIMENSIONS
    BRODER, AZ
    BRUCKSTEIN, AM
    KOPLOWITZ, J
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1985, 15 (01): : 136 - 139