Probabilistic reduced K-means cluster analysis

被引:0
|
作者
Lee, Seunghoon [1 ]
Song, Juwon [1 ]
机构
[1] Korea Univ, Dept Stat, 145 Anam Ro, Seoul 02841, South Korea
关键词
cluster analysis; dimension reduction; unsupervised learning; EM-algorithm; high-dimension; FACTORIAL;
D O I
10.5351/KJAS.2021.34.6.905
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.
引用
收藏
页码:905 / 922
页数:18
相关论文
共 50 条
  • [11] Initializing k-Means Efficiently: Benefits for Exploratory Cluster Analysis
    Fritz, Manuel
    Schwarz, Holger
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2019 CONFERENCES, 2019, 11877 : 146 - 163
  • [12] Classification of aquifer vulnerability using K-means cluster analysis
    Javadi, S.
    Hashemy, S. M.
    Mohammadi, K.
    Howard, K. W. F.
    Neshat, A.
    JOURNAL OF HYDROLOGY, 2017, 549 : 27 - 37
  • [13] Integration of artificial immune network and K-means for cluster analysis
    R. J. Kuo
    S. S. Chen
    W. C. Cheng
    C. Y. Tsai
    Knowledge and Information Systems, 2014, 40 : 541 - 557
  • [14] Integration of artificial immune network and K-means for cluster analysis
    Kuo, R. J.
    Chen, S. S.
    Cheng, W. C.
    Tsai, C. Y.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 40 (03) : 541 - 557
  • [15] CLUSTER-ANALYSIS BY THE K-MEANS ALGORITHM AND SIMULATED ANNEALING
    SUN, LX
    XU, F
    LIANG, YZ
    XIE, YL
    YU, RQ
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1994, 25 (01) : 51 - 60
  • [16] Factorial and reduced K-means reconsidered
    Timmerman, Marieke E.
    Ceulemans, Eva
    Kiers, Henk A. L.
    Vichi, Maurizio
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (07) : 1858 - 1871
  • [17] Variance Reduced K-means Clustering
    Zhao, Yawei
    Ming, Yuewei
    Liu, Xinwang
    Zhu, En
    Yin, Jianping
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8187 - 8188
  • [18] CNAK: Cluster number assisted K-means
    Saha, Jayasree
    Mukherjee, Jayanta
    PATTERN RECOGNITION, 2021, 110
  • [19] Rough k-means cluster with adaptive parameters
    Zhou, Tao
    Zhang, Yan-Ning
    Yuan, He-Jing
    Lu, Hui-Ling
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3063 - +
  • [20] Cluster structure of K-means clustering via principal component analysis
    Ding, C
    He, XF
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2004, 3056 : 414 - 418