Clustering Stable Instances of Euclidean k-means

被引:0
|
作者
Dutta, Abhratanu [1 ]
Vijayaraghavan, Aravindan [1 ]
Wang, Alex [2 ]
机构
[1] Northwestern Univ, Evanston, IL 60208 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance). This captures the property that k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we can design a simple, efficient algorithm with provable guarantees that is also robust to outliers. We also complement these results by studying the amount of stability in real datasets, and demonstrating that our algorithm performs well on these benchmark datasets.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] On Euclidean k-Means Clustering with α-Center Proximity
    Deshpande, Amit
    Louis, Anand
    Singh, Apoorv Vikram
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [2] AN EXACT ALGORITHM FOR STABLE INSTANCES OF THE k-MEANS PROBLEM WITH PENALTIES IN FIXED-DIMENSIONAL EUCLIDEAN SPACE
    Yuan, Fan
    Xu, Dachuan
    Du, Donglei
    Li, Min
    [J]. JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2021, 18 (05) : 3487 - 3498
  • [3] Stable Initialization Scheme for K-Means Clustering
    XU Junling1
    2. State Key Laboratory of Software Engineering
    3. Department of Computer
    [J]. Wuhan University Journal of Natural Sciences, 2009, 14 (01) : 24 - 28
  • [4] Learning Assignment Order of Instances for the Constrained K-Means Clustering Algorithm
    Hong, Yi
    Kwong, Sam
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2009, 39 (02): : 568 - 574
  • [5] A Performance Comparison of Euclidean, Manhattan and Minkowski Distances in K-Means Clustering
    Haviluddin
    Iqbal, Muhammad
    Putra, Gubtha Mahendra
    Puspitasari, Novianti
    Setyadi, Hario Jati
    Dwiyanto, Felix Andika
    Wibawa, Aji Prasetya
    Alfred, Rayner
    [J]. 2020 6TH INTERNATIONAL CONFERENCE ON SCIENCE IN INFORMATION TECHNOLOGY (ICSITECH): EMBRACING INDUSTRY 4.0: TOWARDS INNOVATION IN DISASTER MANAGEMENT, 2020, : 184 - 188
  • [6] ON CORESETS FOR k-MEDIAN AND k-MEANS CLUSTERING IN METRIC AND EUCLIDEAN SPACES AND THEIR APPLICATIONS
    Chen, Ke
    [J]. SIAM JOURNAL ON COMPUTING, 2009, 39 (03) : 923 - 947
  • [7] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
    Shi Na
    Liu Xumin
    Guan Yong
    [J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
  • [8] A refined approximation for Euclidean k-means
    Grandoni, Fabrizio
    Ostrovsky, Rafail
    Rabani, Yuval
    Schulman, Leonard J.
    Venkat, Rakesh
    [J]. INFORMATION PROCESSING LETTERS, 2022, 176
  • [9] Improved Coresets for Euclidean k-Means
    Cohen-Addad, Vincent
    Larsen, Kasper Green
    Saulpic, David
    Schwiegelshohn, Chris
    Sheikh-Omar, Omar Ali
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [10] K-Means Cloning: Adaptive Spherical K-Means Clustering
    Hedar, Abdel-Rahman
    Ibrahim, Abdel-Monem M.
    Abdel-Hakim, Alaa E.
    Sewisy, Adel A.
    [J]. ALGORITHMS, 2018, 11 (10):