Reporting Neighbors in High-Dimensional Euclidean Space

被引:0
|
作者
Aiger, Dror [1 ]
Kaplan, Haim [2 ]
Sharir, Micha [2 ,3 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
[2] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel
[3] NYU, Courant Inst Math Sci, New York, NY 10012 USA
关键词
APPROXIMATE NEAREST-NEIGHBOR; OPTIMAL HASHING ALGORITHMS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the following problem, which arises in many database and web-based applications: Given a set P of n points in a high-dimensional space Rd and a distance r, we want to report all pairs of points of P at Euclidean distance at most r. We present two randomized algorithms, one based on randomly shifted grids, and the other on randomly shifted and rotated grids. The running time of both algorithms is of the form C (d)(n + k) log n, where k is the output size and C (d) is a constant that depends on the dimension d. The log n factor is needed to guarantee, with high probability, that all neighbor pairs are reported, and can be dropped if it suffices to report, in expectation, an arbitrarily large fraction of the pairs. When only translations are used, C (d) is of the form (a p d)d, for some (small) absolute constant a 0 : 484; this bound is worst-case tight, up to an exponential factor of about 2d. When both rotations and translations are used, C (d) can be improved to roughly 6 : 74d, getting rid of the super-exponential factor p d d. When the input set (lies in a subset of d -space that) has low doubling dimension ffi, the performance of the first algorithm improves to C (d; ffi)(n + k) log n (or to C (d; ffi)(n + k)), where C (d; ffi) = O ((ed= ffi)ffi), for ffi p d. Otherwise, C (d; ffi) = O e p d p d ffi . We also present experimental results on several large datasets, demonstrating that our algorithms run significantly faster than all the leading existing algorithms for reporting neighbors.
引用
收藏
页码:784 / 803
页数:20
相关论文
共 50 条
  • [1] REPORTING NEIGHBORS IN HIGH-DIMENSIONAL EUCLIDEAN SPACE
    Aiger, Dror
    Kaplan, Haim
    Sharir, Micha
    [J]. SIAM JOURNAL ON COMPUTING, 2014, 43 (04) : 1363 - 1395
  • [2] Online search for a hyperplane in high-dimensional Euclidean space
    Antoniadis, Antonios
    Hoeksma, Ruben
    Kisfaludi-Bak, Sandor
    Schewior, Kevin
    [J]. INFORMATION PROCESSING LETTERS, 2022, 177
  • [3] Hubs in space: Popular nearest neighbors in high-dimensional data
    Radovanović, Miloš
    Nanopoulos, Alexandros
    Ivanović, Mirjana
    [J]. Journal of Machine Learning Research, 2010, 11 : 2487 - 2531
  • [4] Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
    Radovanovic, Milos
    Nanopoulos, Alexandros
    Ivanovic, Mirjana
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 2487 - 2531
  • [5] Reflection-Like Maps in High-Dimensional Euclidean Space
    Huang, Zhiyong
    Li, Baokui
    [J]. MATHEMATICS, 2020, 8 (06)
  • [6] Homology of moduli spaces of linkages in high-dimensional Euclidean space
    Schuetz, Dirk
    [J]. ALGEBRAIC AND GEOMETRIC TOPOLOGY, 2013, 13 (02): : 1183 - 1224
  • [7] An effective method for approximating the Euclidean distance in high-dimensional space
    Jeong, Seungdo
    Kim, Sang-Wook
    Kim, Kidong
    Choi, Byung-Uk
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, 4080 : 863 - 872
  • [8] Polynomial approximate discretization of geometric centers in high-dimensional Euclidean space
    Vladimir Shenmaier
    [J]. Advances in Data Analysis and Classification, 2022, 16 : 1039 - 1067
  • [9] Polynomial approximate discretization of geometric centers in high-dimensional Euclidean space
    Shenmaier, Vladimir
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2022, 16 (04) : 1039 - 1067
  • [10] A Structural Theorem for Center-Based Clustering in High-Dimensional Euclidean Space
    Shenmaier, Vladimir
    [J]. MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, 2019, 11943 : 284 - 295