Probabilistic reduced K-means cluster analysis

被引:0
|
作者
Lee, Seunghoon [1 ]
Song, Juwon [1 ]
机构
[1] Korea Univ, Dept Stat, 145 Anam Ro, Seoul 02841, South Korea
关键词
cluster analysis; dimension reduction; unsupervised learning; EM-algorithm; high-dimension; FACTORIAL;
D O I
10.5351/KJAS.2021.34.6.905
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.
引用
收藏
页码:905 / 922
页数:18
相关论文
共 50 条
  • [41] Classification of Healthy Family Indicators in Indonesia Based on a K-means Cluster Analysis
    Maryani, Herti
    Rizkianti, Anissa
    Izza, Nailul
    JOURNAL OF PREVENTIVE MEDICINE & PUBLIC HEALTH, 2024, 57 (03): : 234 - 241
  • [42] Division of Driver's Vision Plane Based on K-means Cluster Analysis
    Ji, Bingkui
    Yao, Xueping
    Men, Yuzhuo
    Li, Mingda
    2019 5TH INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND MATERIAL APPLICATION, 2020, 440
  • [43] An efficient hybrid approach based on PSO and K-means for symmetrical cluster analysis
    Qu, Jianhua
    Journal of Information and Computational Science, 2012, 9 (17): : 5443 - 5450
  • [44] Predicting Dynamic Product Price by Online Analysis: Modified K-Means Cluster
    Nayak, Manjushree
    Narain, Bhavana
    COMPUTATIONAL INTELLIGENCE IN PATTERN RECOGNITION, CIPR 2020, 2020, 1120 : 1 - 15
  • [45] Assessment of social vulnerability to groundwater pollution using K-means cluster analysis
    Marisela Uzcategui-Salazar
    Javier Lillo
    Environmental Science and Pollution Research, 2023, 30 : 14975 - 14992
  • [46] An efficient hybrid approach based on PSO, ABC and k-means for cluster analysis
    Pu, Qiumei
    Gan, Jingkai
    Qiu, Lirong
    Duan, Jiaxin
    Wang, Hui
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (14) : 19321 - 19339
  • [47] Segmenting the organic food market in Lebanon: an application of k-means cluster analysis
    Tleis, Malak
    Callieris, Roberta
    Roma, Rocco
    BRITISH FOOD JOURNAL, 2017, 119 (07): : 1423 - 1441
  • [48] On K-Means Cluster Preservation using Quantization Schemes
    Turaga, Deepak S.
    Vlachos, Michail
    Verscheure, Olivier
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 533 - +
  • [49] Using K-Means Clustering to Cluster Provinces in Indonesia
    Ahmar, Ansari Saleh
    Napitupulu, Darmawan
    Rahim, Robbi
    Hidayat, Rahmat
    Sonatha, Yance
    Azmi, Meri
    2ND INTERNATIONAL CONFERENCE ON STATISTICS, MATHEMATICS, TEACHING, AND RESEARCH 2017, 2018, 1028
  • [50] Privacy Preservation in k-Means Clustering by Cluster Rotation
    Dhiraj, S. S. Shivaji
    Khan, Ameer M. Asif
    Khan, Wajhiulla
    Challagalla, Ajay
    TENCON 2009 - 2009 IEEE REGION 10 CONFERENCE, VOLS 1-4, 2009, : 1437 - 1443