Probabilistic reduced K-means cluster analysis

被引:0
|
作者
Lee, Seunghoon [1 ]
Song, Juwon [1 ]
机构
[1] Korea Univ, Dept Stat, 145 Anam Ro, Seoul 02841, South Korea
关键词
cluster analysis; dimension reduction; unsupervised learning; EM-algorithm; high-dimension; FACTORIAL;
D O I
10.5351/KJAS.2021.34.6.905
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.
引用
收藏
页码:905 / 922
页数:18
相关论文
共 50 条
  • [31] Strong Consistency of Reduced K-means Clustering
    Terada, Yoshikazu
    SCANDINAVIAN JOURNAL OF STATISTICS, 2014, 41 (04) : 913 - 931
  • [32] k*-means -: A generalized k-means clustering algorithm with unknown cluster number
    Cheung, YM
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 307 - 317
  • [33] Assessment of social vulnerability to groundwater pollution using K-means cluster analysis
    Uzcategui-Salazar, Marisela
    Lillo, Javier
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2023, 30 (06) : 14975 - 14992
  • [34] A NEW HYBRID ALGORITHM BASED ON PSO, SA, AND K-MEANS FOR CLUSTER ANALYSIS
    Firouzi, Bahman Bahmani
    Sadeghi, Mokhtar Sha
    Niknam, Taher
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (07): : 3177 - 3192
  • [35] An efficient hybrid approach based on PSO, ABC and k-means for cluster analysis
    Qiumei Pu
    Jingkai Gan
    Lirong Qiu
    Jiaxin Duan
    Hui Wang
    Multimedia Tools and Applications, 2022, 81 : 19321 - 19339
  • [36] An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis
    Niknam, Taher
    Amiri, Babak
    APPLIED SOFT COMPUTING, 2010, 10 (01) : 183 - 197
  • [37] Raman imaging based on K-means cluster analysis for human breast tissues
    Yu, Ge
    Lu, Ai-Jun
    Wang, Bin
    Xu, Xiao-Xuan
    Guangdianzi Jiguang/Journal of Optoelectronics Laser, 2012, 23 (11): : 2243 - 2248
  • [38] MIKCA - FORTRAN-IV ITERATIVE K-MEANS CLUSTER ANALYSIS PROGRAM
    MCRAE, DJ
    BEHAVIORAL SCIENCE, 1971, 16 (04): : 423 - &
  • [39] Analysis of Industrial Productivity in the Bogor Regency Region using K-Means Cluster
    Pahmi, Muhamad Ali
    Imtihan, Miftahul
    Mastang
    Wilarso
    IWAIIP 2023 - Conference Proceeding: International Workshop on Artificial Intelligence and Image Processing, 2023, : 294 - 299
  • [40] Cluster Analysis using A Gradient Evolution-based K-means Algorithm
    Kuo, R. J.
    Zulvia, Ferani E.
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 5138 - 5145