Probabilistic reduced K-means cluster analysis

被引：0

作者：

Lee, Seunghoon ^{[1
]}

Song, Juwon ^{[1
]}

机构：

[1] Korea Univ, Dept Stat, 145 Anam Ro, Seoul 02841, South Korea

来源：

KOREAN JOURNAL OF APPLIED STATISTICS | 2021年 / 34卷 / 06期

关键词：

cluster analysis; dimension reduction; unsupervised learning; EM-algorithm; high-dimension; FACTORIAL;

D O I：

10.5351/KJAS.2021.34.6.905

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.

引用

页码：905 / 922

页数：18

共 50 条

[11] Initializing k-Means Efficiently: Benefits for Exploratory Cluster Analysis
Fritz, Manuel
Schwarz, Holger
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2019 CONFERENCES, 2019, 11877 : 146 - 163
[12] Classification of aquifer vulnerability using K-means cluster analysis
Javadi, S.
Hashemy, S. M.
Mohammadi, K.
Howard, K. W. F.
Neshat, A.
JOURNAL OF HYDROLOGY, 2017, 549 : 27 - 37
[13] Integration of artificial immune network and K-means for cluster analysis
R. J. Kuo
S. S. Chen
W. C. Cheng
C. Y. Tsai
Knowledge and Information Systems, 2014, 40 : 541 - 557
[14] Integration of artificial immune network and K-means for cluster analysis
Kuo, R. J.
Chen, S. S.
Cheng, W. C.
Tsai, C. Y.
KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 40 (03) : 541 - 557
[15] CLUSTER-ANALYSIS BY THE K-MEANS ALGORITHM AND SIMULATED ANNEALING
SUN, LX
XU, F
LIANG, YZ
XIE, YL
YU, RQ
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1994, 25 (01) : 51 - 60
[16] Factorial and reduced K-means reconsidered
Timmerman, Marieke E.
Ceulemans, Eva
Kiers, Henk A. L.
Vichi, Maurizio
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (07) : 1858 - 1871
[17] Variance Reduced K-means Clustering
Zhao, Yawei
Ming, Yuewei
Liu, Xinwang
Zhu, En
Yin, Jianping
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8187 - 8188
[18] CNAK: Cluster number assisted K-means
Saha, Jayasree
Mukherjee, Jayanta
PATTERN RECOGNITION, 2021, 110
[19] Rough k-means cluster with adaptive parameters
Zhou, Tao
Zhang, Yan-Ning
Yuan, He-Jing
Lu, Hui-Ling
PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3063 - +
[20] Cluster structure of K-means clustering via principal component analysis
Ding, C
He, XF
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2004, 3056 : 414 - 418

← 1 2 3 4 5 →