An extension of the K-means algorithm to clustering skewed data

被引：6

作者：

Melnykov, Volodymyr ^{[1
]}

Zhu, Xuwen ^{[2
]}

机构：

[1] Univ Alabama, Tuscaloosa, AL 35487 USA

[2] Univ Louisville, Louisville, KY 40292 USA

来源：

COMPUTATIONAL STATISTICS | 2019年 / 34卷 / 01期

关键词：

Exponential transformation; CEM algorithm; Cluster analysis; Skewness; R-PACKAGE; SIMULATING DATA; CLASSIFICATION; PERFORMANCE;

D O I：

10.1007/s00180-018-0821-z

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Grouping similar objects into common groups, also known as clustering, is an important problem of unsupervised machine learning. Various clustering algorithms have been proposed in literature. In recent years, the need to analyze large amounts of data has led to reconsidering some fundamental clustering procedures. One of them is the celebrated K-means algorithm popular among practitioners due to its speedy performance and appealingly intuitive construction. Unfortunately, the algorithm often shows poor performance unless data groups have spherical shapes and approximately same sizes. In many applications, this restriction is so severe that the use of the K-means algorithm becomes questionable, misleading, or simply incorrect. We propose an extension of K-means that preserves the speed and intuitive interpretation of the original algorithm while providing greater flexibility in modeling clusters. The idea of the proposed generalization relies on the exponential transformation of Manly originally designed to obtain near-normally distributed data. The suggested modification is derived and illustrated on several datasets with good results.

引用

页码：373 / 394

页数：22

共 50 条

[41] Clustering Data in Power Management System Using k-Means Clustering Algorithm
Aryani, Ressy
Nasrun, Muhammad
Setianingsih, Casi
Murti, Muhammad Ary
[J]. 2019 IEEE ASIA PACIFIC CONFERENCE ON WIRELESS AND MOBILE (APWIMOB), 2019, : 164 - 170
[42] A MAX-MIN CLUSTERING METHOD FOR k-MEANS ALGORITHM OF DATA CLUSTERING
Yuan, Baolan
Zhang, Wanjun
Yuan, Yubo
[J]. JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2012, 8 (03) : 565 - 575
[43] Clustering of Image Data Using K-Means and Fuzzy K-Means
Rahmani, Md. Khalid Imam
Pal, Naina
Arora, Kamiya
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
[44] k*-means:: A new generalized k-means clustering algorithm
Cheung, YM
[J]. PATTERN RECOGNITION LETTERS, 2003, 24 (15) : 2883 - 2893
[45] K*-Means: An Effective and Efficient K-means Clustering Algorithm
Qi, Jianpeng
Yu, Yanwei
Wang, Lihong
Liu, Jinglei
[J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 242 - 249
[46] K-Means Clustering With Incomplete Data
Wang, Siwei
Li, Miaomiao
Hu, Ning
Zhu, En
Hu, Jingtao
Liu, Xinwang
Yin, Jianping
[J]. IEEE ACCESS, 2019, 7 : 69162 - 69171
[47] k-Means Clustering of Asymmetric Data
Olszewski, Dominik
[J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT I, 2012, 7208 : 243 - 254
[48] A fast K-Means clustering algorithm based on grid data reduction
Li, Daqi
Shen, Junyi
Chen, Hongmin
[J]. 2008 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2008, : 2273 - +
[49] Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data
Xie, Ting
Liu, Ruihua
Wei, Zhengyuan
[J]. APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2020, 5 (01) : 1 - 10
[50] Data clustering using K-Means based on Crow Search Algorithm
K Lakshmi
N Karthikeyani Visalakshi
S Shanthi
[J]. Sādhanā, 2018, 43

← 1 2 3 4 5 →