An extension of the K-means algorithm to clustering skewed data

被引：6

作者：

Melnykov, Volodymyr ^{[1
]}

Zhu, Xuwen ^{[2
]}

机构：

[1] Univ Alabama, Tuscaloosa, AL 35487 USA

[2] Univ Louisville, Louisville, KY 40292 USA

来源：

COMPUTATIONAL STATISTICS | 2019年 / 34卷 / 01期

关键词：

Exponential transformation; CEM algorithm; Cluster analysis; Skewness; R-PACKAGE; SIMULATING DATA; CLASSIFICATION; PERFORMANCE;

D O I：

10.1007/s00180-018-0821-z

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Grouping similar objects into common groups, also known as clustering, is an important problem of unsupervised machine learning. Various clustering algorithms have been proposed in literature. In recent years, the need to analyze large amounts of data has led to reconsidering some fundamental clustering procedures. One of them is the celebrated K-means algorithm popular among practitioners due to its speedy performance and appealingly intuitive construction. Unfortunately, the algorithm often shows poor performance unless data groups have spherical shapes and approximately same sizes. In many applications, this restriction is so severe that the use of the K-means algorithm becomes questionable, misleading, or simply incorrect. We propose an extension of K-means that preserves the speed and intuitive interpretation of the original algorithm while providing greater flexibility in modeling clusters. The idea of the proposed generalization relies on the exponential transformation of Manly originally designed to obtain near-normally distributed data. The suggested modification is derived and illustrated on several datasets with good results.

引用

页码：373 / 394

页数：22

共 50 条

[1] An extension of the K-means algorithm to clustering skewed data
Volodymyr Melnykov
Xuwen Zhu
[J]. Computational Statistics, 2019, 34 : 373 - 394
[2] Towards multicriteria clustering:: An extension of the k-means algorithm
De Smet, Y
Guzmán, LM
[J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2004, 158 (02) : 390 - 398
[3] IMPROVEMENT IN K-MEANS CLUSTERING ALGORITHM FOR DATA CLUSTERING
Rajeswari, K.
Acharya, Omkar
Sharma, Mayur
Kopnar, Mahesh
Karandikar, Kiran
[J]. 1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 367 - 369
[4] On K-means Data Clustering Algorithm with Genetic Algorithm
Kapil, Shruti
Chawla, Meenu
Ansari, Mohd Dilshad
[J]. 2016 FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2016, : 202 - 206
[5] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
Shi Na
Liu Xumin
Guan Yong
[J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
[6] Soil data clustering by using K-means and fuzzy K-means algorithm
Hot, Elma
Popovic-Bugarin, Vesna
[J]. 2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
[7] The SKM Algorithm: A K-Means Algorithm for Clustering Sequential Data
Dias, Jose G.
Cortinhal, Maria Joao
[J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2008, PROCEEDINGS, 2008, 5290 : 173 - 182
[8] An efficient K-means clustering algorithm for tall data
Capo, Marco
Perez, Aritz
Lozano, Jose A.
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (03) : 776 - 811
[9] An efficient K-means clustering algorithm for tall data
Marco Capó
Aritz Pérez
Jose A. Lozano
[J]. Data Mining and Knowledge Discovery, 2020, 34 : 776 - 811
[10] Parallelization of K-Means Clustering Algorithm for Data Mining
Jiang, Hao
Yu, Liyan
[J]. 4TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA 2017), 2017, 12

← 1 2 3 4 5 →