Probabilistic reduced K-means cluster analysis

被引：0

作者：

Lee, Seunghoon ^{[1
]}

Song, Juwon ^{[1
]}

机构：

[1] Korea Univ, Dept Stat, 145 Anam Ro, Seoul 02841, South Korea

来源：

KOREAN JOURNAL OF APPLIED STATISTICS | 2021年 / 34卷 / 06期

关键词：

cluster analysis; dimension reduction; unsupervised learning; EM-algorithm; high-dimension; FACTORIAL;

D O I：

10.5351/KJAS.2021.34.6.905

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.

引用

页码：905 / 922

页数：18

共 50 条

[1] Sparse probabilistic K-means
Jung, Yoon Mo
Whang, Joyce Jiyoung
Yun, Sangwoon
APPLIED MATHEMATICS AND COMPUTATION, 2020, 382
[2] Multimorbidity patterns with K-means nonhierarchical cluster analysis
Violan, Concepcion
Roso-Llorach, Albert
Foguet-Boreu, Quinti
Guisado-Clavero, Marina
Pons-Vigues, Mariona
Pujol-Ribera, Enriqueta
Valderas, Jose M.
BMC FAMILY PRACTICE, 2018, 19
[3] Multimorbidity patterns with K-means nonhierarchical cluster analysis
Concepción Violán
Albert Roso-Llorach
Quintí Foguet-Boreu
Marina Guisado-Clavero
Mariona Pons-Vigués
Enriqueta Pujol-Ribera
Jose M. Valderas
BMC Family Practice, 19
[4] K-means cluster analysis and seismicity partitioning for Pakistan
Khaista Rehman
Paul W. Burton
Graeme A. Weatherill
Journal of Seismology, 2014, 18 : 401 - 419
[5] K-means cluster analysis and seismicity partitioning for Pakistan
Rehman, Khaista
Burton, Paul W.
Weatherill, Graeme A.
JOURNAL OF SEISMOLOGY, 2014, 18 (03) : 401 - 419
[6] A Global k-means Approach for Autonomous Cluster Initialization of Probabilistic Neural Network
Chang, Roy Kwang Yang
Loo, Chu Kiong
Rao, M. V. C.
INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2008, 32 (02): : 219 - 225
[7] Kernel Probabilistic K-Means Clustering
Liu, Bowen
Zhang, Ting
Li, Yujian
Liu, Zhaoying
Zhang, Zhilin
SENSORS, 2021, 21 (05) : 1 - 16
[8] On Probabilistic k-Richness of the k-Means Algorithms
Klopotek, Robert A.
Klopotek, Mieczyslaw A.
MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, 2019, 11943 : 259 - 271
[9] k-means Cluster Shape Implications
Klopotek, Mieczyslaw A.
Wierzchon, Slawomir T.
Klopotek, Robert A.
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2020, PT I, 2020, 583 : 107 - 118
[10] Faster K-Means Cluster Estimation
Khandelwal, Siddhesh
Awekar, Amit
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 520 - 526

← 1 2 3 4 5 →