Streaming Euclidean k-median and k-means with o(log n) Space

被引:0
|
作者
Cohen-Addad, Vincent [1 ]
Woodruff, David P. [2 ]
Zhou, Samson [3 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Texas A&M Univ, College Stn, TX USA
关键词
streaming model; clustering; sublinear algorithms; CORESETS;
D O I
10.1109/FOCS57990.2023.00057
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the classic Euclidean k-median and k-means objective on data streams, where the goal is to provide a (1+epsilon)-approximation to the optimal k-median or k-means solution, while using as little memory as possible. Over the last 20 years, clustering in data streams has received a tremendous amount of attention and has been the test-bed for a large variety of new techniques, including coresets, the merge-and-reduce framework, bicriteria approximation, sensitivity sampling, and so on. Despite this intense effort to obtain smaller sketches for these problems, all known techniques require storing at least Omega(log(n Delta)) words of memory, where n is size of the input and Delta is the aspect ratio. A natural question is if one can beat this logarithmic dependence on n and Delta. In this paper, we break this barrier by first giving an insertion-only streaming algorithm that achieves a (1 + epsilon)-approximation to the more general (k, z)-clustering problem, using (O) over tilde (dk/epsilon(2)) center dot (2(z log z)) center dot min (1/epsilon(z), k) center dot poly(log log(n Delta)) words of memory. Our techniques can also be used to achieve two-pass algorithms for k-median and k-means clustering on dynamic streams using (O) over tilde (1/epsilon(2)) center dot poly(d, k, log log(n Delta)) words of memory.
引用
收藏
页码:883 / 908
页数:26
相关论文
共 50 条
  • [1] ON CORESETS FOR k-MEDIAN AND k-MEANS CLUSTERING IN METRIC AND EUCLIDEAN SPACES AND THEIR APPLICATIONS
    Chen, Ke
    SIAM JOURNAL ON COMPUTING, 2009, 39 (03) : 923 - 947
  • [2] BETTER GUARANTEES FOR k-MEANS AND EUCLIDEAN k-MEDIAN BY PRIMAL-DUAL ALGORITHMS
    Ahmadian, Sara
    Norouzi-Fard, Ashkan
    Svensson, Ola
    Ward, Justin
    SIAM JOURNAL ON COMPUTING, 2020, 49 (04) : 97 - 156
  • [3] Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms
    Ahmadian, Sara
    Norouzi-Fard, Ashkan
    Svensson, Ola
    Ward, Justin
    2017 IEEE 58TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2017, : 61 - 72
  • [4] Smaller Coresets for k-Median and k-Means Clustering
    Sariel Har-Peled
    Akash Kushal
    Discrete & Computational Geometry, 2007, 37 : 3 - 19
  • [5] Smaller coresets for k-median and k-means clustering
    Har-Peled, Sariel
    Kushal, Akash
    DISCRETE & COMPUTATIONAL GEOMETRY, 2007, 37 (01) : 3 - 19
  • [6] Improved approximations for Euclidean k-means and k-median, via nested quasi-independent sets
    Cohen-Addad, Vincent
    Esfandiari, Hossein
    Mirrokni, Vahab
    Narayanan, Shyam
    Proceedings of the Annual ACM Symposium on Theory of Computing, 2022, : 1621 - 1628
  • [7] Stability yields a PTAS for k-Median and k-Means Clustering
    Awasthi, Pranjal
    Blum, Avrim
    Sheffet, Or
    2010 IEEE 51ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2010, : 309 - 318
  • [8] Improved Approximations for Euclidean k-Means and k-Median, via Nested Quasi-Independent Sets
    Cohen-Addad, Vincent
    Esfandiari, Hossein
    Mirrokni, Vahab
    Narayanan, Shyam
    PROCEEDINGS OF THE 54TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '22), 2022, : 1621 - 1628
  • [9] LOCAL SEARCH YIELDS APPROXIMATION SCHEMES FOR k-MEANS AND k-MEDIAN IN EUCLIDEAN AND MINOR-FREE METRICS
    Cohen-Addad, Vincent
    Klein, Philip N.
    Mathieu, Claire
    SIAM JOURNAL ON COMPUTING, 2019, 48 (02) : 644 - 667
  • [10] Local search yields approximation schemes for k-means and k-median in Euclidean and minor-free metrics
    Cohen-Addad, Vincent
    Klein, Philip N.
    Mathieu, Claire
    2016 IEEE 57TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2016, : 353 - 364