On the "dimensionality curse" and the "self-similarity blessing"

被引:126
|
作者
Korn, F
Pagel, BU
Faloutsos, C
机构
[1] AT&T Labs Res, Florham Pk, NJ 07932 USA
[2] SAP AG, D-69190 Walldorf, Germany
[3] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
关键词
nearest-neighbor search; multimedia indexing; fractals;
D O I
10.1109/69.908983
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial queries in high-dimensional spaces have been studied extensively recently. Among them, nearest-neighbor queries are important in many settings, including spatial databases (Find the k closest cities) and multimedia databases (Find the k most similar images). Previous analyses have concluded that nearest-neighbor search is hopeless in high dimensions due to the notorious "curse of dimensionality." Here. we show that this may be overpessimistic. We show that what determines the search performance (at least for R-tree-like structures) is the intrinsic dimensionality of the data set and not the dimensionality of the address space (referred to as the embedding dimensionality). The typical (and often implicit) assumption in many previous studies is that the data is uniformly distributed, with independence between attributes. However, real data sets overwhelmingly disobey these assumptions; rather, they typically are skewed and exhibit intrinsic ("fractal") dimensionalities that are much lower than their embedding dimension, e.g., due to subtle dependencies between attributes. In this paper, we show how the Hausdorff and Correlation fractal dimensions of a data set can yield extremely accurate formulas that can predict the I/O performance to within one standard deviation on multiple real and synthetic data sets. The practical contributions of this work are our accurate formulas, which can be used for query optimization in spatial and multimedia databases. The major theoretical contribution is the "deflation" of the dimensionality curse: Our formulas and our experiments show that previous worst-case analyses of nearest-neighbor search in high dimensions are overpessimistic to the point of being unrealistic. The performance depends critically on the intrinsic ("fractal") dimensionality as opposed to the embedding dimension that the uniformity and independence assumptions incorrectly imply.
引用
收藏
页码:96 / 111
页数:16
相关论文
共 50 条
  • [1] The Curse of Dimensionality: A Blessing to Personalized Medicine
    Catchpoole, Daniel R.
    Kennedy, Paul
    Skillicorn, David B.
    Simoff, Simeon
    [J]. JOURNAL OF CLINICAL ONCOLOGY, 2010, 28 (34) : E723 - E724
  • [2] Blessing of randomness against the curse of dimensionality
    Kucheryavskiy, Sergey
    [J]. JOURNAL OF CHEMOMETRICS, 2018, 32 (01)
  • [3] The curse of dimensionality and the blessing of multiple hybrid
    Intrator, N
    [J]. LIMITATIONS AND FUTURE TRENDS IN NEURAL COMPUTATION, 2003, 186 : 163 - 176
  • [4] The Curse of Dimensionality: A Blessing to Personalized Medicine Reply
    Sikic, Branimir I.
    Tibshirani, Robert
    Lacayo, Norman J.
    [J]. JOURNAL OF CLINICAL ONCOLOGY, 2010, 28 (34) : E725 - E725
  • [5] Software Similarity Patterns and Clones: A Curse or Blessing?
    Jarzabek, Stan
    [J]. PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS), VOL 2, 2020, : 5 - 17
  • [6] Using Self-similarity to Incorporate Dimensionality Reduction and Cluster Evolution Tracking
    Yan, Guanghui
    Chen, Yong
    Zhao, Hongyun
    Ren, Yajin
    Ma, Zhicheng
    [J]. INDUSTRIAL INSTRUMENTATION AND CONTROL SYSTEMS II, PTS 1-3, 2013, 336-338 : 2242 - +
  • [7] SELF-SIMILARITY
    LEWELLEN, GB
    [J]. ROCKY MOUNTAIN JOURNAL OF MATHEMATICS, 1993, 23 (03) : 1023 - 1040
  • [8] Generalized self-similarity
    Cabrelli, CA
    Molter, UM
    [J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1999, 230 (01) : 251 - 260
  • [9] SELF-SIMILARITY INPAINTING
    Ardis, Paul A.
    Brown, Christopher M.
    [J]. 2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 2789 - 2792
  • [10] IN DARKEST SELF-SIMILARITY
    KENNER, H
    [J]. BYTE, 1990, 15 (06): : 382 - 383