k-variates plus plus : more pluses in the k-means plus

被引：0

作者：

Nock, Richard ^{[1
]}

Canyasse, Raphael ^{[3
]}

Boreli, Roksana ^{[2
]}

Nielsen, Frank ^{[4
]}

机构：

[1] Australian Natl Univ, Canberra, ACT, Australia

[2] Univ New South Wales, Sydney, NSW, Australia

[3] Technion, Haifa, Israel

[4] Sony CS Labs Inc, Tokyo, Japan

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48 | 2016年 / 48卷

基金：

澳大利亚研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

k-means++ seeding has become a de facto standard for hard clustering algorithms. In this paper, our first contribution is a two-way generalisation of this seeding, k-variates++, that includes the sampling of general densities rather than just a discrete set of Dirac densities anchored at the point locations, and a generalisation of the well known Arthur-Vassilvitskii (AV) approximation guarantee, in the form of a bias+variance approximation bound of the global optimum. This approximation exhibits a reduced dependency on the "noise" component with respect to the optimal potential - actually approaching the statistical lower bound. We show that k-variates++ reduces to efficient (biased seeding) clustering algorithms tailored to specific frameworks; these include distributed, streaming and on-line clustering, with direct approximation results for these algorithms. Finally, we present a novel application of k-variates++ to differential privacy. For either the specific frameworks considered here, or for the differential privacy setting, there is little to no prior results on the direct application of k-means++ and its approximation bounds - state of the art contenders appear to be significantly more complex and/or display less favorable (approximation) properties. We stress that our algorithms can still be run in cases where there is no closed form solution for the population minimizer. We demonstrate the applicability of our analysis via experimental evaluation on several domains and settings, displaying competitive performances vs state of the art.

引用

页数：10

共 50 条

[1] Improved Guarantees for k-means plus plus and k-means plus plus Parallel
Makarychev, Konstantin
Reddy, Aravind
Shan, Liren
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[2] Exact Acceleration of K-Means plus plus and K-Means∥
Raff, Edward
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2928 - 2935
[3] On the Consistency of k-means plus plus algorithm
Klopotek, Mieczyslaw A.
[J]. FUNDAMENTA INFORMATICAE, 2020, 172 (04) : 361 - 377
[4] Comparison of K-means and K-means plus plus for image compression with thermographic images
Biswas, Hridoy
Umbaugh, Scott E.
Marino, Dominic
Sackman, Joseph
[J]. THERMOSENSE: THERMAL INFRARED APPLICATIONS XLIII, 2021, 11743
[5] k-means plus plus : Few More Steps Yield Constant Approximation
Choo, Davin
Grunau, Christoph
Portmann, Julian
Rozhon, Vaclav
[J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[6] k-means plus plus : Few More Steps Yield Constant Approximation
Choo, Davin
Grunau, Christoph
Portmann, Julian
Rozhon, Vaclav
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[7] Robust k-means plus
Deshpande, Amit
Kacham, Praneeth
Pratap, Rameshwar
[J]. CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 799 - 808
[8] k-means plus plus : The Advantages of Careful Seeding
Arthur, David
Vassilvitskii, Sergei
[J]. PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2007, : 1027 - 1035
[9] Efficient k-Means plus plus Approximation with MapReduce
Xu, Yujie
Qu, Wenyu
Li, Zhiyang
Min, Geyong
Li, Keqiu
Liu, Zhaobin
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (12) : 3135 - 3144
[10] Approximate K-Means plus plus in Sublinear Time
Bachem, Olivier
Lucic, Mario
Hassani, S. Hamed
Krause, Andreas
[J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1459 - 1467

← 1 2 3 4 5 →