An extension of the K-means algorithm to clustering skewed data

被引:6
|
作者
Melnykov, Volodymyr [1 ]
Zhu, Xuwen [2 ]
机构
[1] Univ Alabama, Tuscaloosa, AL 35487 USA
[2] Univ Louisville, Louisville, KY 40292 USA
关键词
Exponential transformation; CEM algorithm; Cluster analysis; Skewness; R-PACKAGE; SIMULATING DATA; CLASSIFICATION; PERFORMANCE;
D O I
10.1007/s00180-018-0821-z
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Grouping similar objects into common groups, also known as clustering, is an important problem of unsupervised machine learning. Various clustering algorithms have been proposed in literature. In recent years, the need to analyze large amounts of data has led to reconsidering some fundamental clustering procedures. One of them is the celebrated K-means algorithm popular among practitioners due to its speedy performance and appealingly intuitive construction. Unfortunately, the algorithm often shows poor performance unless data groups have spherical shapes and approximately same sizes. In many applications, this restriction is so severe that the use of the K-means algorithm becomes questionable, misleading, or simply incorrect. We propose an extension of K-means that preserves the speed and intuitive interpretation of the original algorithm while providing greater flexibility in modeling clusters. The idea of the proposed generalization relies on the exponential transformation of Manly originally designed to obtain near-normally distributed data. The suggested modification is derived and illustrated on several datasets with good results.
引用
收藏
页码:373 / 394
页数:22
相关论文
共 50 条
  • [1] An extension of the K-means algorithm to clustering skewed data
    Volodymyr Melnykov
    Xuwen Zhu
    [J]. Computational Statistics, 2019, 34 : 373 - 394
  • [2] Towards multicriteria clustering:: An extension of the k-means algorithm
    De Smet, Y
    Guzmán, LM
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2004, 158 (02) : 390 - 398
  • [3] IMPROVEMENT IN K-MEANS CLUSTERING ALGORITHM FOR DATA CLUSTERING
    Rajeswari, K.
    Acharya, Omkar
    Sharma, Mayur
    Kopnar, Mahesh
    Karandikar, Kiran
    [J]. 1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 367 - 369
  • [4] On K-means Data Clustering Algorithm with Genetic Algorithm
    Kapil, Shruti
    Chawla, Meenu
    Ansari, Mohd Dilshad
    [J]. 2016 FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2016, : 202 - 206
  • [5] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
    Shi Na
    Liu Xumin
    Guan Yong
    [J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
  • [6] Soil data clustering by using K-means and fuzzy K-means algorithm
    Hot, Elma
    Popovic-Bugarin, Vesna
    [J]. 2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
  • [7] The SKM Algorithm: A K-Means Algorithm for Clustering Sequential Data
    Dias, Jose G.
    Cortinhal, Maria Joao
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2008, PROCEEDINGS, 2008, 5290 : 173 - 182
  • [8] An efficient K-means clustering algorithm for tall data
    Capo, Marco
    Perez, Aritz
    Lozano, Jose A.
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (03) : 776 - 811
  • [9] An efficient K-means clustering algorithm for tall data
    Marco Capó
    Aritz Pérez
    Jose A. Lozano
    [J]. Data Mining and Knowledge Discovery, 2020, 34 : 776 - 811
  • [10] Parallelization of K-Means Clustering Algorithm for Data Mining
    Jiang, Hao
    Yu, Liyan
    [J]. 4TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA 2017), 2017, 12