Machine learning friendly set version of Johnson-Lindenstrauss lemma

被引:0
|
作者
Klopotek, Mieczyslaw A. [1 ]
机构
[1] Polish Acad Sci, Inst Comp Sci, Ul Jana Kazimierza 5, PL-01248 Warsaw, Poland
关键词
Johnson-Lindenstrauss lemma; Random projection; Sample distortion; Dimensionality reduction; Linear JL transform; k-means algorithm; Clusterability retention; RANDOM-PROJECTION; PROOF;
D O I
10.1007/s10115-019-01412-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The widely discussed and applied Johnson-Lindenstrauss (JL) Lemmahas an existential form saying that for each set of data points Q in n-dimensional space, there exists a transformation f into an n'-dimensional space (n' < n) such that for each pair u, v is an element of Q (1 - delta) parallel to u - v parallel to(2) <= parallel to f (u) - f (v)parallel to(2) <= (1 + delta)parallel to u - v parallel to(2) for a user-defined error parameter delta. Furthermore, it is asserted that with some finite probability the transformation f may be found as a random projection (with scaling) onto the n' dimensional subspace so that after sufficiently many repetitions of random projection, f will be found with user-defined success rate 1 - epsilon. In this paper, we make a novel use of the JL Lemma. We prove a theorem stating that we can choose the target dimensionality in a random projection-type JL linear transformation in such a way that with probability 1 - epsilon all of data points from Q fall into predefined error range d for any user-predefined failure probability epsilon when performing a single random projection. This result is important for applications such as data clustering where we want to have a priori dimensionality reducing transformation instead of attempting a (large) number of them, as with traditional Johnson-Lindenstrauss Lemma. Furthermore, we investigate an important issue whether or not the projection according to JL Lemma is really useful when conducting data processing, that is whether the solutions to the clustering in the projected space apply to the original space. In particular, we take a closer look at the k-means algorithm and prove that a good solution in the projected space is also a good solution in the original space. Furthermore, under proper assumptions local optima in the original space are also ones in the projected space. We investigate also a broader issue of preserving clusterability under JL Lemma projection. We define the conditions for which clusterability property of the original space is transmitted to the projected space, so that a broad class of clustering algorithms for the original space is applicable in the projected space.
引用
收藏
页码:1961 / 2009
页数:49
相关论文
共 50 条
  • [41] On Fast Johnson-Lindenstrauss Embeddings of Compact Submanifolds of RN with Boundary
    Iwen, Mark A.
    Schmidt, Benjamin
    Tavakoli, Arman
    DISCRETE & COMPUTATIONAL GEOMETRY, 2024, 71 (02) : 498 - 555
  • [42] Dimension reduction for data streams based on Johnson-Lindenstrauss transform
    Yang, Jing
    Zhao, Jia-Shi
    Zhang, Jian-Pei
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2013, 43 (06): : 1626 - 1630
  • [43] Optimal fast Johnson-Lindenstrauss embeddings for large data sets
    Bamberger, Stefan
    Krahmer, Felix
    SAMPLING THEORY SIGNAL PROCESSING AND DATA ANALYSIS, 2021, 19 (01):
  • [44] A Space with No Unconditional Basis that Satisfies the Johnson–Lindenstrauss Lemma
    Jesús Suárez de la Fuente
    Results in Mathematics, 2019, 74
  • [45] NEW AND IMPROVED JOHNSON-LINDENSTRAUSS EMBEDDINGS VIA THE RESTRICTED ISOMETRY PROPERTY
    Krahmer, Felix
    Ward, Rachel
    SIAM JOURNAL ON MATHEMATICAL ANALYSIS, 2011, 43 (03) : 1269 - 1281
  • [46] Clustering and Classification to Evaluate Data Reduction via Johnson-Lindenstrauss Transform
    Ghalib, Abdulaziz
    Jessup, Tyler D.
    Johnson, Julia
    Monemian, Seyedamin
    ADVANCES IN INFORMATION AND COMMUNICATION, VOL 2, 2020, 1130 : 190 - 209
  • [47] Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Subconstant Error
    Jayram, T. S.
    Woodruff, David P.
    ACM TRANSACTIONS ON ALGORITHMS, 2013, 9 (03)
  • [48] Modewise Johnson-Lindenstrauss embeddings for nuclear many-body theory
    Zare, A.
    Wirth, R.
    Haselby, C. A.
    Hergert, H.
    Iwen, M.
    EUROPEAN PHYSICAL JOURNAL A, 2023, 59 (05):
  • [49] Guarantees for the Kronecker fast Johnson-Lindenstrauss transform using a coherence and sampling argument
    Malik, Osman Asif
    Becker, Stephen
    LINEAR ALGEBRA AND ITS APPLICATIONS, 2020, 602 : 120 - 137
  • [50] Acceleration of randomized Kaczmarz method via the Johnson–Lindenstrauss Lemma
    Yonina C. Eldar
    Deanna Needell
    Numerical Algorithms, 2011, 58 : 163 - 177