WORST-CASE AND SMOOTHED ANALYSIS OF K-MEANS CLUSTERING WITH BREGMAN DIVERGENCES

被引：0

作者：

Mantheyt, Bodo ^{[1
]}

Roeglint, Heiko ^{[2
]}

机构：

[1] Univ Twente, Dept Appl Math, POB 217, NL-7500 AE Enschede, Netherlands

[2] Univ Bonn, Dept Comp Sci, D-53113 Bonn, Germany

来源：

JOURNAL OF COMPUTATIONAL GEOMETRY | 2013年 / 4卷 / 01期

关键词：

D O I：

暂无

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

The k-means method is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice despite its exponential worst-case running-time. To narrow the gap between theory and practice, k-means has been studied in the semi-random input model of smoothed analysis, which often leads to more realistic conclusions than mere worst-case analysis. For the case that n data points in R-d are perturbed by Gaussian noise with standard deviation sigma, it has been shown that the expected running-time is bounded by a polynomial in n and 1/sigma. This result assumes that squared Euclidean distances are used as the distance measure. In many applications, however, data is to be clustered with respect to Bregman divergences rather than squared Euclidean distances. A prominent example is the Kullback-Leibler divergence (a.k.a. relative entropy) that is commonly used to cluster web pages. To broaden the knowledge about this important class of distance measures, we analyze the running-time of the k-means method for Bregman divergences. We first give a smoothed analysis of k-means with (almost) arbitrary Bregman divergences, and we show bounds of poly(n(root k), 1/sigma) and k(kd) . poly(n, 1/sigma). The latter yields a polynomial bound if k and d are small compared to n. On the other hand, we show that the exponential lower bound carries over to a huge class of Bregman divergences.

引用

页码：94 / 132

页数：39

共 50 条

[1] Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences
Manthey, Bodo
Roglin, Heiko
ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2009, 5878 : 1024 - +
[2] Worst-Case and Smoothed Analysis of the Hartigan-Wong Method for k-Means Clustering
Manthey, Bodo
van Rhijn, Jesse
41ST INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE, STACS 2024, 2024, 289
[3] Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method
Arthur, David
Vassilvitskii, Sergei
47TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2006, : 153 - +
[4] WORST-CASE AND SMOOTHED ANALYSIS OF THE ICP ALGORITHM, WITH AN APPLICATION TO THE k-MEANS METHOD
Arthur, David
Vassilvitskii, Sergei
SIAM JOURNAL ON COMPUTING, 2009, 39 (02) : 766 - 782
[5] Multi-view K-Means Clustering with Bregman Divergences
Wu, Yan
Du, Liang
Cheng, Honghong
ARTIFICIAL INTELLIGENCE (ICAI 2018), 2018, 888 : 26 - 38
[6] k-Means Clustering with Holder Divergences
Nielsen, Frank
Sun, Ke
Marchand-Maillet, Stephane
GEOMETRIC SCIENCE OF INFORMATION, GSI 2017, 2017, 10589 : 856 - 863
[7] On Clustering Histograms with k-Means by Using Mixed α-Divergences
Nielsen, Frank
Nock, Richard
Amari, Shun-ichi
ENTROPY, 2014, 16 (06) : 3273 - 3301
[8] Smoothed Analysis of the k-Means Method
Arthur, David
Manthey, Bodo
Roeglin, Heiko
JOURNAL OF THE ACM, 2011, 58 (05)
[9] Bregman Power k-Means for Clustering Exponential Family Data
Vellal, Adithya
Chakraborty, Saptarshi
Xu, Jason
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[10] Improving Bregman k-means
Ashour, Wesam
Fyfe, Colin
INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2014, 6 (01) : 65 - 82

← 1 2 3 4 5 →