Data-Dependent Hashing via Nonlinear Spectral Gaps

被引:14
|
作者
Andoni, Alexandr [1 ]
Naor, Assaf [2 ]
Nikolov, Aleksandar [3 ]
Razenshteyn, Ilya [4 ]
Waingarten, Erik [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Princeton Univ, Princeton, NJ 08544 USA
[3] Univ Toronto, Toronto, ON, Canada
[4] Microsoft Res Redmond, Redmond, WA USA
基金
加拿大自然科学与工程研究理事会;
关键词
Nearest neighbor search; nonlinear spectral gaps; randomized space partitions; locality-sensitive hashing; NEAREST-NEIGHBOR; APPROXIMATE; EXPANDERS;
D O I
10.1145/3188745.3188846
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We establish a generic reduction from nonlinear spectral gaps of metric spaces to data-dependent Locality-Sensitive Hashing, yielding a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN) under various distance functions. Using this reduction, we obtain the following results: For general d-dimensional normed spaces and n-point datasets, we obtain a cell-probe ANN data structure with approximation O(log d/epsilon(2)) d(O(1))n1 epsilon, and d(O(1)) n(epsilon) cell probes per query, for any epsilon > 0. No non-trivial approximation was known before in this generality other than the O(root d) bound which follows from embedding a general norm into l(2). For and Schatten-p norms, we improve the data structure further, to obtain approximation 0(p) and sublinear query time. For l(p), this improves upon the previous best approximation 2(O(P)) (which required polynomial as opposed to near-linear in n space). For the Schatten-p norm, no non-trivial ANN data structure was known before this work. Previous approaches to the ANN problem either exploit the low dimensionality of a metric, requiring space exponential in the dimension, or circumvent the curse of dimensionality by embedding a metric into a "tractable" space, such as l(1). Our new generic reduction proceeds differently from both of these approaches using a novel partitioning method.
引用
收藏
页码:787 / 800
页数:14
相关论文
共 50 条
  • [41] Data-dependent kernel machines for Microarray data classification
    Xiong, Huilin
    Zhang, Ya
    Chen, Xue-Wen
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2007, 4 (04) : 583 - 595
  • [42] DATA-DEPENDENT PERMUTATION TECHNIQUES FOR THE ANALYSIS OF ECOLOGICAL DATA
    BIONDINI, ME
    MIELKE, PW
    BERRY, KJ
    [J]. VEGETATIO, 1988, 75 (03): : 161 - 168
  • [43] Extending the data parallel paradigm with data-dependent operators
    Biancardi, A
    Mérigot, A
    [J]. PARALLEL COMPUTING, 2002, 28 (7-8) : 995 - 1021
  • [44] Data detection and coding for data-dependent superimposed training
    Wang, Ping
    Fan, Pingzhi
    Yuan, Weina
    Darnell, Michael
    [J]. IET SIGNAL PROCESSING, 2014, 8 (02) : 138 - 145
  • [46] ADVERSARIAL DEFENSE VIA THE DATA-DEPENDENT ACTIVATION, TOTAL VARIATION MINIMIZATION, AND ADVERSARIAL TRAINING
    Wang, Bao
    Lin, Alex
    Yin, Penghang
    Zhu, Wei
    Bertozzi, Andrea L.
    Osher, Stanley J.
    [J]. INVERSE PROBLEMS AND IMAGING, 2021, 15 (01) : 129 - 145
  • [47] Uniform Estimates of Nonlinear Spectral Gaps
    Takefumi Kondo
    Tetsu Toyoda
    [J]. Graphs and Combinatorics, 2015, 31 : 1517 - 1530
  • [48] An error diffusion algorithm with data-dependent prefiltering
    Hanaoka, C
    Taguchi, A
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2006, 89 (05): : 1 - 11
  • [49] A Low-Complexity Data-Dependent Beamformer
    Synnevag, Johan-Fredrik
    Austeng, Andreas
    Holm, Sverre
    [J]. IEEE TRANSACTIONS ON ULTRASONICS FERROELECTRICS AND FREQUENCY CONTROL, 2011, 58 (02) : 281 - 289
  • [50] Classification Model with Subspace Data-Dependent Balls
    Klakhaeng, Nattapon
    Kangkachit, Thanapat
    Rakthanmanon, Thanawin
    Waiyamai, Kitsana
    [J]. 2013 10TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2013, : 211 - 216