Reporting Neighbors in High-Dimensional Euclidean Space

被引：0

作者：

Aiger, Dror ^{[1
]}

Kaplan, Haim ^{[2
]}

Sharir, Micha ^{[2
,3
]}

机构：

[1] Google Inc, Mountain View, CA 94043 USA

[2] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel

[3] NYU, Courant Inst Math Sci, New York, NY 10012 USA

来源：

PROCEEDINGS OF THE TWENTY-FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA 2013) | 2013年

关键词：

APPROXIMATE NEAREST-NEIGHBOR; OPTIMAL HASHING ALGORITHMS;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We consider the following problem, which arises in many database and web-based applications: Given a set P of n points in a high-dimensional space Rd and a distance r, we want to report all pairs of points of P at Euclidean distance at most r. We present two randomized algorithms, one based on randomly shifted grids, and the other on randomly shifted and rotated grids. The running time of both algorithms is of the form C (d)(n + k) log n, where k is the output size and C (d) is a constant that depends on the dimension d. The log n factor is needed to guarantee, with high probability, that all neighbor pairs are reported, and can be dropped if it suffices to report, in expectation, an arbitrarily large fraction of the pairs. When only translations are used, C (d) is of the form (a p d)d, for some (small) absolute constant a 0 : 484; this bound is worst-case tight, up to an exponential factor of about 2d. When both rotations and translations are used, C (d) can be improved to roughly 6 : 74d, getting rid of the super-exponential factor p d d. When the input set (lies in a subset of d -space that) has low doubling dimension ffi, the performance of the first algorithm improves to C (d; ffi)(n + k) log n (or to C (d; ffi)(n + k)), where C (d; ffi) = O ((ed= ffi)ffi), for ffi p d. Otherwise, C (d; ffi) = O e p d p d ffi . We also present experimental results on several large datasets, demonstrating that our algorithms run significantly faster than all the leading existing algorithms for reporting neighbors.

引用

页码：784 / 803

页数：20

共 50 条

[1] REPORTING NEIGHBORS IN HIGH-DIMENSIONAL EUCLIDEAN SPACE
Aiger, Dror
Kaplan, Haim
Sharir, Micha
[J]. SIAM JOURNAL ON COMPUTING, 2014, 43 (04) : 1363 - 1395
[2] Online search for a hyperplane in high-dimensional Euclidean space
Antoniadis, Antonios
Hoeksma, Ruben
Kisfaludi-Bak, Sandor
Schewior, Kevin
[J]. INFORMATION PROCESSING LETTERS, 2022, 177
[3] Hubs in space: Popular nearest neighbors in high-dimensional data
Radovanović, Miloš
Nanopoulos, Alexandros
Ivanović, Mirjana
[J]. Journal of Machine Learning Research, 2010, 11 : 2487 - 2531
[4] Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
Radovanovic, Milos
Nanopoulos, Alexandros
Ivanovic, Mirjana
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 2487 - 2531
[5] Reflection-Like Maps in High-Dimensional Euclidean Space
Huang, Zhiyong
Li, Baokui
[J]. MATHEMATICS, 2020, 8 (06)
[6] Homology of moduli spaces of linkages in high-dimensional Euclidean space
Schuetz, Dirk
[J]. ALGEBRAIC AND GEOMETRIC TOPOLOGY, 2013, 13 (02): : 1183 - 1224
[7] An effective method for approximating the Euclidean distance in high-dimensional space
Jeong, Seungdo
Kim, Sang-Wook
Kim, Kidong
Choi, Byung-Uk
[J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, 4080 : 863 - 872
[8] Polynomial approximate discretization of geometric centers in high-dimensional Euclidean space
Vladimir Shenmaier
[J]. Advances in Data Analysis and Classification, 2022, 16 : 1039 - 1067
[9] Polynomial approximate discretization of geometric centers in high-dimensional Euclidean space
Shenmaier, Vladimir
[J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2022, 16 (04) : 1039 - 1067
[10] A Structural Theorem for Center-Based Clustering in High-Dimensional Euclidean Space
Shenmaier, Vladimir
[J]. MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, 2019, 11943 : 284 - 295

← 1 2 3 4 5 →