An extension of the K-means algorithm to clustering skewed data

被引:6
|
作者
Melnykov, Volodymyr [1 ]
Zhu, Xuwen [2 ]
机构
[1] Univ Alabama, Tuscaloosa, AL 35487 USA
[2] Univ Louisville, Louisville, KY 40292 USA
关键词
Exponential transformation; CEM algorithm; Cluster analysis; Skewness; R-PACKAGE; SIMULATING DATA; CLASSIFICATION; PERFORMANCE;
D O I
10.1007/s00180-018-0821-z
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Grouping similar objects into common groups, also known as clustering, is an important problem of unsupervised machine learning. Various clustering algorithms have been proposed in literature. In recent years, the need to analyze large amounts of data has led to reconsidering some fundamental clustering procedures. One of them is the celebrated K-means algorithm popular among practitioners due to its speedy performance and appealingly intuitive construction. Unfortunately, the algorithm often shows poor performance unless data groups have spherical shapes and approximately same sizes. In many applications, this restriction is so severe that the use of the K-means algorithm becomes questionable, misleading, or simply incorrect. We propose an extension of K-means that preserves the speed and intuitive interpretation of the original algorithm while providing greater flexibility in modeling clusters. The idea of the proposed generalization relies on the exponential transformation of Manly originally designed to obtain near-normally distributed data. The suggested modification is derived and illustrated on several datasets with good results.
引用
收藏
页码:373 / 394
页数:22
相关论文
共 50 条
  • [41] Clustering Data in Power Management System Using k-Means Clustering Algorithm
    Aryani, Ressy
    Nasrun, Muhammad
    Setianingsih, Casi
    Murti, Muhammad Ary
    [J]. 2019 IEEE ASIA PACIFIC CONFERENCE ON WIRELESS AND MOBILE (APWIMOB), 2019, : 164 - 170
  • [42] A MAX-MIN CLUSTERING METHOD FOR k-MEANS ALGORITHM OF DATA CLUSTERING
    Yuan, Baolan
    Zhang, Wanjun
    Yuan, Yubo
    [J]. JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2012, 8 (03) : 565 - 575
  • [43] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [44] k*-means:: A new generalized k-means clustering algorithm
    Cheung, YM
    [J]. PATTERN RECOGNITION LETTERS, 2003, 24 (15) : 2883 - 2893
  • [45] K*-Means: An Effective and Efficient K-means Clustering Algorithm
    Qi, Jianpeng
    Yu, Yanwei
    Wang, Lihong
    Liu, Jinglei
    [J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 242 - 249
  • [46] K-Means Clustering With Incomplete Data
    Wang, Siwei
    Li, Miaomiao
    Hu, Ning
    Zhu, En
    Hu, Jingtao
    Liu, Xinwang
    Yin, Jianping
    [J]. IEEE ACCESS, 2019, 7 : 69162 - 69171
  • [47] k-Means Clustering of Asymmetric Data
    Olszewski, Dominik
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT I, 2012, 7208 : 243 - 254
  • [48] A fast K-Means clustering algorithm based on grid data reduction
    Li, Daqi
    Shen, Junyi
    Chen, Hongmin
    [J]. 2008 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2008, : 2273 - +
  • [49] Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data
    Xie, Ting
    Liu, Ruihua
    Wei, Zhengyuan
    [J]. APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2020, 5 (01) : 1 - 10
  • [50] Data clustering using K-Means based on Crow Search Algorithm
    K Lakshmi
    N Karthikeyani Visalakshi
    S Shanthi
    [J]. Sādhanā, 2018, 43