ON CORESETS FOR k-MEDIAN AND k-MEANS CLUSTERING IN METRIC AND EUCLIDEAN SPACES AND THEIR APPLICATIONS

被引:105
|
作者
Chen, Ke [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
关键词
k-median clustering; k-means clustering; coreset; random sampling; high dimensions; approximation algorithms; FACILITY LOCATION; ALGORITHM;
D O I
10.1137/070699007
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present new approximation algorithms for the k-median and k-means clustering problems. To this end, we obtain small coresets for k-median and k-means clustering in general metric spaces and in Euclidean spaces. In R-d, these coresets are of size with polynomial dependency on the dimension d. This leads to (1 + epsilon)-approximation algorithms to the optimal k-median and k-means clustering in Rd, with running time O(ndk + 2((k/epsilon)O(1)) d(2) log(k+2) n), where n is the number of points. This improves over previous results. We use those coresets to maintain a (1 + epsilon)-approximate k-median and k-means clustering of a stream of points in R-d, using O(d(2)k(2)epsilon(-2) log(8) n) space. These are the first streaming algorithms, for those problems, that have space complexity with polynomial dependency on the dimension.
引用
收藏
页码:923 / 947
页数:25
相关论文
共 50 条
  • [1] Smaller Coresets for k-Median and k-Means Clustering
    Sariel Har-Peled
    Akash Kushal
    [J]. Discrete & Computational Geometry, 2007, 37 : 3 - 19
  • [2] Smaller coresets for k-median and k-means clustering
    Har-Peled, Sariel
    Kushal, Akash
    [J]. DISCRETE & COMPUTATIONAL GEOMETRY, 2007, 37 (01) : 3 - 19
  • [3] Towards Optimal Lower Bounds for k-Median and k-Means Coresets
    Cohen-Addad, Vincent
    Larsen, Kasper Green
    Saulpic, David
    Schwiegelshohn, Chris
    [J]. PROCEEDINGS OF THE 54TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '22), 2022, : 1038 - 1051
  • [4] Stability yields a PTAS for k-Median and k-Means Clustering
    Awasthi, Pranjal
    Blum, Avrim
    Sheffet, Or
    [J]. 2010 IEEE 51ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2010, : 309 - 318
  • [5] Improved Coresets for Euclidean k-Means
    Cohen-Addad, Vincent
    Larsen, Kasper Green
    Saulpic, David
    Schwiegelshohn, Chris
    Sheikh-Omar, Omar Ali
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [6] Robust K-Median and K-Means Clustering Algorithms for Incomplete Data
    Li, Jinhua
    Song, Shiji
    Zhang, Yuli
    Zhou, Zhen
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
  • [7] Streaming Euclidean k-median and k-means with o(log n) Space
    Cohen-Addad, Vincent
    Woodruff, David P.
    Zhou, Samson
    [J]. 2023 IEEE 64TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, FOCS, 2023, : 883 - 908
  • [8] Outlier Detection using Clustering Techniques - K-means and K-median
    Angelin, B.
    Geetha, A.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS 2020), 2020, : 373 - 378
  • [9] BETTER GUARANTEES FOR k-MEANS AND EUCLIDEAN k-MEDIAN BY PRIMAL-DUAL ALGORITHMS
    Ahmadian, Sara
    Norouzi-Fard, Ashkan
    Svensson, Ola
    Ward, Justin
    [J]. SIAM JOURNAL ON COMPUTING, 2020, 49 (04) : 97 - 156
  • [10] Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms
    Ahmadian, Sara
    Norouzi-Fard, Ashkan
    Svensson, Ola
    Ward, Justin
    [J]. 2017 IEEE 58TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2017, : 61 - 72