On the quality of k-means clustering based on grouped data

被引:2
|
作者
Kaeaerik, Meelis [1 ]
Paerna, Kalev [1 ]
机构
[1] Univ Tartu, Inst Stat Math, EE-50090 Tartu, Estonia
关键词
Grouped data; k-Means; Lloyd's algorithm; Loss-function; Voronoi partitions; QUANTIZATION;
D O I
10.1016/j.jspi.2009.05.021
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Let us have a probability distribution P (possibly empirical) on the real line R. Consider the problem of finding the k-mean of P. i.e. a set A of at most k points that minimizes given loss-function. It is known that the k-mean can be found using an iterative algorithm by Lloyd [1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129-136]. However, depending on the complexity of the distribution P. the application of this algorithm can be quite resource-consuming. One possibility to overcome the problem is to group the original data and calculate the k-mean on the basis of the grouped data. As a result, the new k-mean will be biased, and our aim is to measure the loss of the quality of approximation caused by such approach. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:3836 / 3841
页数:6
相关论文
共 50 条
  • [1] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [2] Authentication of uncertain data based on k-means clustering
    Unver, Levent
    Gundem, Taflan I.
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2016, 24 (04) : 2910 - 2928
  • [3] A K-means Based Genetic Algorithm for Data Clustering
    Pizzuti, Clara
    Procopio, Nicola
    [J]. INTERNATIONAL JOINT CONFERENCE SOCO'16- CISIS'16-ICEUTE'16, 2017, 527 : 211 - 222
  • [4] K-Means Clustering With Incomplete Data
    Wang, Siwei
    Li, Miaomiao
    Hu, Ning
    Zhu, En
    Hu, Jingtao
    Liu, Xinwang
    Yin, Jianping
    [J]. IEEE ACCESS, 2019, 7 : 69162 - 69171
  • [5] k-Means Clustering of Asymmetric Data
    Olszewski, Dominik
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT I, 2012, 7208 : 243 - 254
  • [6] A Quality Metric for K-Means Clustering Based on Centroid Locations
    Thulasidas, Manoj
    [J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II, 2022, 13726 : 208 - 222
  • [7] Soil data clustering by using K-means and fuzzy K-means algorithm
    Hot, Elma
    Popovic-Bugarin, Vesna
    [J]. 2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
  • [8] A hierarchical k-means clustering based fingerprint quality classification
    Munir, Muhammad Umer
    Javed, Muhammad Younus
    Khan, Shoab Ahmad
    [J]. NEUROCOMPUTING, 2012, 85 : 62 - 67
  • [9] IMPROVEMENT IN K-MEANS CLUSTERING ALGORITHM FOR DATA CLUSTERING
    Rajeswari, K.
    Acharya, Omkar
    Sharma, Mayur
    Kopnar, Mahesh
    Karandikar, Kiran
    [J]. 1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 367 - 369
  • [10] The fast clustering algorithm for the big data based on K-means
    Xie, Ting
    Zhang, Taiping
    [J]. INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)