On the quality of k-means clustering based on grouped data

被引:2
|
作者
Kaeaerik, Meelis [1 ]
Paerna, Kalev [1 ]
机构
[1] Univ Tartu, Inst Stat Math, EE-50090 Tartu, Estonia
关键词
Grouped data; k-Means; Lloyd's algorithm; Loss-function; Voronoi partitions; QUANTIZATION;
D O I
10.1016/j.jspi.2009.05.021
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Let us have a probability distribution P (possibly empirical) on the real line R. Consider the problem of finding the k-mean of P. i.e. a set A of at most k points that minimizes given loss-function. It is known that the k-mean can be found using an iterative algorithm by Lloyd [1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129-136]. However, depending on the complexity of the distribution P. the application of this algorithm can be quite resource-consuming. One possibility to overcome the problem is to group the original data and calculate the k-mean on the basis of the grouped data. As a result, the new k-mean will be biased, and our aim is to measure the loss of the quality of approximation caused by such approach. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:3836 / 3841
页数:6
相关论文
共 50 条
  • [41] A Clustering Method Based on K-Means Algorithm
    Li, Youguo
    Wu, Haiyan
    INTERNATIONAL CONFERENCE ON SOLID STATE DEVICES AND MATERIALS SCIENCE, 2012, 25 : 1104 - 1109
  • [42] Distributed Clustering Based on K-means and CPGA
    Zhou, Jun
    Liu, Zhijing
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 444 - 447
  • [43] A Novel MapReduce Based k-Means Clustering
    Sinha, Ankita
    Jana, Prasanta K.
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND COMMUNICATION, 2017, 458 : 247 - 255
  • [44] Entropy Based Soft K-means Clustering
    Bai, Xue
    Luo, Siwei
    Zhao, Yibiao
    2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 107 - 110
  • [45] Locality Preserving Based K-Means Clustering
    Yang, Xiaohuan
    Wang, Xiaoming
    Tian, Yong
    Du, Yajun
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING TECHNIQUES, ISCIDE 2015, PT II, 2015, 9243 : 86 - 95
  • [47] Mahalanobis Distance Based K-Means Clustering
    Brown, Paul O.
    Chiang, Meng Ching
    Guo, Shiqing
    Jin, Yingzi
    Leung, Carson K.
    Murray, Evan L.
    Pazdor, Adam G. M.
    Cuzzocrea, Alfredo
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2022, 2022, 13428 : 256 - 262
  • [48] Global regionalization of heat environment quality perception based on K-means clustering and Google trends data
    Kim, Yesuel
    Kim, Youngchul
    SUSTAINABLE CITIES AND SOCIETY, 2023, 96
  • [49] A Fuzzy Clustering Algorithm Based on K-means
    Yan, Zhen
    Pi, Dechang
    ECBI: 2009 INTERNATIONAL CONFERENCE ON ELECTRONIC COMMERCE AND BUSINESS INTELLIGENCE, PROCEEDINGS, 2009, : 523 - 528
  • [50] Parallel K-Means Clustering Based on MapReduce
    Zhao, Weizhong
    Ma, Huifang
    He, Qing
    CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 674 - 679