k-variates plus plus : more pluses in the k-means plus

被引:0
|
作者
Nock, Richard [1 ]
Canyasse, Raphael [3 ]
Boreli, Roksana [2 ]
Nielsen, Frank [4 ]
机构
[1] Australian Natl Univ, Canberra, ACT, Australia
[2] Univ New South Wales, Sydney, NSW, Australia
[3] Technion, Haifa, Israel
[4] Sony CS Labs Inc, Tokyo, Japan
基金
澳大利亚研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
k-means++ seeding has become a de facto standard for hard clustering algorithms. In this paper, our first contribution is a two-way generalisation of this seeding, k-variates++, that includes the sampling of general densities rather than just a discrete set of Dirac densities anchored at the point locations, and a generalisation of the well known Arthur-Vassilvitskii (AV) approximation guarantee, in the form of a bias+variance approximation bound of the global optimum. This approximation exhibits a reduced dependency on the "noise" component with respect to the optimal potential - actually approaching the statistical lower bound. We show that k-variates++ reduces to efficient (biased seeding) clustering algorithms tailored to specific frameworks; these include distributed, streaming and on-line clustering, with direct approximation results for these algorithms. Finally, we present a novel application of k-variates++ to differential privacy. For either the specific frameworks considered here, or for the differential privacy setting, there is little to no prior results on the direct application of k-means++ and its approximation bounds - state of the art contenders appear to be significantly more complex and/or display less favorable (approximation) properties. We stress that our algorithms can still be run in cases where there is no closed form solution for the population minimizer. We demonstrate the applicability of our analysis via experimental evaluation on several domains and settings, displaying competitive performances vs state of the art.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Improved Guarantees for k-means plus plus and k-means plus plus Parallel
    Makarychev, Konstantin
    Reddy, Aravind
    Shan, Liren
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [2] Exact Acceleration of K-Means plus plus and K-Means∥
    Raff, Edward
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2928 - 2935
  • [3] On the Consistency of k-means plus plus algorithm
    Klopotek, Mieczyslaw A.
    [J]. FUNDAMENTA INFORMATICAE, 2020, 172 (04) : 361 - 377
  • [4] Comparison of K-means and K-means plus plus for image compression with thermographic images
    Biswas, Hridoy
    Umbaugh, Scott E.
    Marino, Dominic
    Sackman, Joseph
    [J]. THERMOSENSE: THERMAL INFRARED APPLICATIONS XLIII, 2021, 11743
  • [5] k-means plus plus : Few More Steps Yield Constant Approximation
    Choo, Davin
    Grunau, Christoph
    Portmann, Julian
    Rozhon, Vaclav
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [6] k-means plus plus : Few More Steps Yield Constant Approximation
    Choo, Davin
    Grunau, Christoph
    Portmann, Julian
    Rozhon, Vaclav
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [7] Robust k-means plus
    Deshpande, Amit
    Kacham, Praneeth
    Pratap, Rameshwar
    [J]. CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 799 - 808
  • [8] k-means plus plus : The Advantages of Careful Seeding
    Arthur, David
    Vassilvitskii, Sergei
    [J]. PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2007, : 1027 - 1035
  • [9] Efficient k-Means plus plus Approximation with MapReduce
    Xu, Yujie
    Qu, Wenyu
    Li, Zhiyang
    Min, Geyong
    Li, Keqiu
    Liu, Zhaobin
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (12) : 3135 - 3144
  • [10] Approximate K-Means plus plus in Sublinear Time
    Bachem, Olivier
    Lucic, Mario
    Hassani, S. Hamed
    Krause, Andreas
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1459 - 1467