Streaming Euclidean k-median and k-means with o(log n) Space

被引:0
|
作者
Cohen-Addad, Vincent [1 ]
Woodruff, David P. [2 ]
Zhou, Samson [3 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Texas A&M Univ, College Stn, TX USA
关键词
streaming model; clustering; sublinear algorithms; CORESETS;
D O I
10.1109/FOCS57990.2023.00057
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the classic Euclidean k-median and k-means objective on data streams, where the goal is to provide a (1+epsilon)-approximation to the optimal k-median or k-means solution, while using as little memory as possible. Over the last 20 years, clustering in data streams has received a tremendous amount of attention and has been the test-bed for a large variety of new techniques, including coresets, the merge-and-reduce framework, bicriteria approximation, sensitivity sampling, and so on. Despite this intense effort to obtain smaller sketches for these problems, all known techniques require storing at least Omega(log(n Delta)) words of memory, where n is size of the input and Delta is the aspect ratio. A natural question is if one can beat this logarithmic dependence on n and Delta. In this paper, we break this barrier by first giving an insertion-only streaming algorithm that achieves a (1 + epsilon)-approximation to the more general (k, z)-clustering problem, using (O) over tilde (dk/epsilon(2)) center dot (2(z log z)) center dot min (1/epsilon(z), k) center dot poly(log log(n Delta)) words of memory. Our techniques can also be used to achieve two-pass algorithms for k-median and k-means clustering on dynamic streams using (O) over tilde (1/epsilon(2)) center dot poly(d, k, log log(n Delta)) words of memory.
引用
收藏
页码:883 / 908
页数:26
相关论文
共 50 条
  • [31] A nearly linear-time approximation scheme for the Euclidean k-median problem
    Kolliopoulos, SG
    Rao, S
    ALGORITHMS - ESA'99, 1999, 1643 : 378 - 389
  • [32] Approximation of Kernel k-Means for Streaming Data
    Havens, Timothy C.
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 509 - 512
  • [33] A Streaming Algorithm for k-Means with Approximate Coreset
    Li, Min
    Xu, Dachuan
    Zhang, Dongmei
    Zhang, Tong
    ASIA-PACIFIC JOURNAL OF OPERATIONAL RESEARCH, 2019, 36 (01)
  • [34] Streaming k-Means Clustering with Fast Queries
    Zhang, Yu
    Tangwongsan, Kanat
    Tirthapura, Srikanta
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 449 - 460
  • [35] A nearly linear-time approximation scheme for the euclidean k-median problem
    Kolliopoulos, Stavros G.
    Rao, Satish
    SIAM JOURNAL ON COMPUTING, 2007, 37 (03) : 757 - 782
  • [36] Approximation algorithms for the k-median problem
    Solis-Oba, R
    EFFICIENT APPROXIMATION AND ONLINE ALGORITHMS: RECENT PROGRESS ON CLASSICAL COMBINATORIAL OPTIMIZATION PROBLEMS AND NEW APPLICATIONS, 2006, 3484 : 292 - 320
  • [37] A Method of Two Stage Clustering Using Agglomerative Hierarchical Algorithms with One-Pass k-Means plus plus or k-Median plus
    Tamura, Yusuke
    Miyamoto, Sadaaki
    2014 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC), 2014, : 281 - 285
  • [38] An approximation algorithm for k-median with priorities
    Zhen ZHANG
    Qilong FENG
    Jinhui XU
    Jianxin WANG
    ScienceChina(InformationSciences), 2021, 64 (05) : 45 - 46
  • [39] A FEASIBLE K-MEANS KERNEL TRICK UNDER NON-EUCLIDEAN FEATURE SPACE
    Klopotek, Robert
    Klopotek, Mieczyslaw
    Wierzchon, Slawomir
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2020, 30 (04) : 703 - 715
  • [40] On k-Median Clustering in High Dimensions
    Chen, Ke
    PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 1177 - 1185