Streaming Euclidean k-median and k-means with o(log n) Space

被引:0
|
作者
Cohen-Addad, Vincent [1 ]
Woodruff, David P. [2 ]
Zhou, Samson [3 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Texas A&M Univ, College Stn, TX USA
关键词
streaming model; clustering; sublinear algorithms; CORESETS;
D O I
10.1109/FOCS57990.2023.00057
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the classic Euclidean k-median and k-means objective on data streams, where the goal is to provide a (1+epsilon)-approximation to the optimal k-median or k-means solution, while using as little memory as possible. Over the last 20 years, clustering in data streams has received a tremendous amount of attention and has been the test-bed for a large variety of new techniques, including coresets, the merge-and-reduce framework, bicriteria approximation, sensitivity sampling, and so on. Despite this intense effort to obtain smaller sketches for these problems, all known techniques require storing at least Omega(log(n Delta)) words of memory, where n is size of the input and Delta is the aspect ratio. A natural question is if one can beat this logarithmic dependence on n and Delta. In this paper, we break this barrier by first giving an insertion-only streaming algorithm that achieves a (1 + epsilon)-approximation to the more general (k, z)-clustering problem, using (O) over tilde (dk/epsilon(2)) center dot (2(z log z)) center dot min (1/epsilon(z), k) center dot poly(log log(n Delta)) words of memory. Our techniques can also be used to achieve two-pass algorithms for k-median and k-means clustering on dynamic streams using (O) over tilde (1/epsilon(2)) center dot poly(d, k, log log(n Delta)) words of memory.
引用
收藏
页码:883 / 908
页数:26
相关论文
共 50 条
  • [41] A CONCENTRATION INEQUALITY FOR THE K-MEDIAN PROBLEM
    RHEE, WT
    TALAGRAND, M
    MATHEMATICS OF OPERATIONS RESEARCH, 1989, 14 (02) : 189 - 202
  • [42] The Hampered k-Median Problem with Neighbourhoods
    Puerto, Justo
    Valverde, Carlos
    COMPUTERS & OPERATIONS RESEARCH, 2024, 170
  • [43] An approximation algorithm for k-median with priorities
    Zhang, Zhen
    Feng, Qilong
    Xu, Jinhui
    Wang, Jianxin
    SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (05)
  • [44] Fair Coresets and Streaming Algorithms for Fair k-means
    Schmidt, Melanie
    Schwiegelshohn, Chris
    Sohler, Christian
    APPROXIMATION AND ONLINE ALGORITHMS (WAOA 2019), 2020, 11926 : 232 - 251
  • [45] Fast Streaming k-Means Clustering With Coreset Caching
    Zhang, Yu
    Tangwongsan, Kanat
    Tirthapura, Srikanta
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2740 - 2754
  • [46] Streaming k-means on Well-Clusterable Data
    Braverman, Vladimir
    Meyerson, Adam
    Ostrovsky, Rafail
    Roytman, Alan
    Shindler, Michael
    Tagiku, Brian
    PROCEEDINGS OF THE TWENTY-SECOND ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2011, : 26 - 40
  • [47] Adaptive Grid-Based k-median Clustering of Streaming Data with Accuracy Guarantee
    Cao, Jianneng
    Zhou, Yongluan
    Wu, Min
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT1, 2015, 9049 : 75 - 91
  • [48] An approximation algorithm for k-median with priorities
    Zhen Zhang
    Qilong Feng
    Jinhui Xu
    Jianxin Wang
    Science China Information Sciences, 2021, 64
  • [49] K-center and K-median problems in graded distances
    Lin, GH
    Xue, GL
    THEORETICAL COMPUTER SCIENCE, 1998, 207 (01) : 181 - 192
  • [50] K-means - a fast and efficient K-means algorithms
    Nguyen C.D.
    Duong T.H.
    Nguyen, Cuong Duc (nguyenduccuong@tdt.edu.vn), 2018, Inderscience Publishers, 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (11) : 27 - 45