WORST-CASE AND SMOOTHED ANALYSIS OF K-MEANS CLUSTERING WITH BREGMAN DIVERGENCES

被引:0
|
作者
Mantheyt, Bodo [1 ]
Roeglint, Heiko [2 ]
机构
[1] Univ Twente, Dept Appl Math, POB 217, NL-7500 AE Enschede, Netherlands
[2] Univ Bonn, Dept Comp Sci, D-53113 Bonn, Germany
关键词
D O I
暂无
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The k-means method is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice despite its exponential worst-case running-time. To narrow the gap between theory and practice, k-means has been studied in the semi-random input model of smoothed analysis, which often leads to more realistic conclusions than mere worst-case analysis. For the case that n data points in R-d are perturbed by Gaussian noise with standard deviation sigma, it has been shown that the expected running-time is bounded by a polynomial in n and 1/sigma. This result assumes that squared Euclidean distances are used as the distance measure. In many applications, however, data is to be clustered with respect to Bregman divergences rather than squared Euclidean distances. A prominent example is the Kullback-Leibler divergence (a.k.a. relative entropy) that is commonly used to cluster web pages. To broaden the knowledge about this important class of distance measures, we analyze the running-time of the k-means method for Bregman divergences. We first give a smoothed analysis of k-means with (almost) arbitrary Bregman divergences, and we show bounds of poly(n(root k), 1/sigma) and k(kd) . poly(n, 1/sigma). The latter yields a polynomial bound if k and d are small compared to n. On the other hand, we show that the exponential lower bound carries over to a huge class of Bregman divergences.
引用
收藏
页码:94 / 132
页数:39
相关论文
共 50 条
  • [1] Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences
    Manthey, Bodo
    Roglin, Heiko
    ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2009, 5878 : 1024 - +
  • [2] Worst-Case and Smoothed Analysis of the Hartigan-Wong Method for k-Means Clustering
    Manthey, Bodo
    van Rhijn, Jesse
    41ST INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE, STACS 2024, 2024, 289
  • [3] Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method
    Arthur, David
    Vassilvitskii, Sergei
    47TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2006, : 153 - +
  • [4] WORST-CASE AND SMOOTHED ANALYSIS OF THE ICP ALGORITHM, WITH AN APPLICATION TO THE k-MEANS METHOD
    Arthur, David
    Vassilvitskii, Sergei
    SIAM JOURNAL ON COMPUTING, 2009, 39 (02) : 766 - 782
  • [5] Multi-view K-Means Clustering with Bregman Divergences
    Wu, Yan
    Du, Liang
    Cheng, Honghong
    ARTIFICIAL INTELLIGENCE (ICAI 2018), 2018, 888 : 26 - 38
  • [6] k-Means Clustering with Holder Divergences
    Nielsen, Frank
    Sun, Ke
    Marchand-Maillet, Stephane
    GEOMETRIC SCIENCE OF INFORMATION, GSI 2017, 2017, 10589 : 856 - 863
  • [7] On Clustering Histograms with k-Means by Using Mixed α-Divergences
    Nielsen, Frank
    Nock, Richard
    Amari, Shun-ichi
    ENTROPY, 2014, 16 (06) : 3273 - 3301
  • [8] Smoothed Analysis of the k-Means Method
    Arthur, David
    Manthey, Bodo
    Roeglin, Heiko
    JOURNAL OF THE ACM, 2011, 58 (05)
  • [9] Bregman Power k-Means for Clustering Exponential Family Data
    Vellal, Adithya
    Chakraborty, Saptarshi
    Xu, Jason
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Improving Bregman k-means
    Ashour, Wesam
    Fyfe, Colin
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2014, 6 (01) : 65 - 82