Probabilistic reduced K-means cluster analysis

被引:0
|
作者
Lee, Seunghoon [1 ]
Song, Juwon [1 ]
机构
[1] Korea Univ, Dept Stat, 145 Anam Ro, Seoul 02841, South Korea
关键词
cluster analysis; dimension reduction; unsupervised learning; EM-algorithm; high-dimension; FACTORIAL;
D O I
10.5351/KJAS.2021.34.6.905
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.
引用
收藏
页码:905 / 922
页数:18
相关论文
共 50 条
  • [21] Research on the Application of K-means cluster analysis in undergraduate instructional management
    Yao, Jie
    INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL : ICACC 2009 - PROCEEDINGS, 2009, : 628 - 631
  • [22] The cluster analysis in the aluminium industry with K-means method: an application for Bahrain
    Al Qahtani, Haitham
    Sankar, Jayendira P.
    COGENT BUSINESS & MANAGEMENT, 2024, 11 (01):
  • [23] Using K-means Cluster Analysis for Assessment of Environmental Services of Municipalities
    Kilic, Gunay
    Budak, Ibrahim
    Organ, Arzu
    ESKISEHIR OSMANGAZI UNIVERSITESI IIBF DERGISI-ESKISEHIR OSMANGAZI UNIVERSITY JOURNAL OF ECONOMICS AND ADMINISTRATIVE SCIENCES, 2020, 15 (01): : 209 - 230
  • [24] DROUGHT ASSESSMENT IN VOJVODINA (SERBIA) USING K-MEANS CLUSTER ANALYSIS
    Lescesen, Igor
    Dolinaj, Dragan
    Pantelic, Milana
    Popov, Srdan
    JOURNAL OF THE GEOGRAPHICAL INSTITUTE JOVAN CVIJIC SASA, 2019, 69 (01): : 17 - 27
  • [25] James-Stein shrinkage to improve k-means cluster analysis
    Gao, Jinxin
    Hitchcock, David B.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (09) : 2113 - 2127
  • [26] Cluster analysis of crude oils with k-means based on their physicochemical properties
    Sancho, A.
    Ribeiro, J. C.
    Reis, M. S.
    Martins, F. G.
    COMPUTERS & CHEMICAL ENGINEERING, 2022, 157
  • [27] An Effective and Adaptable K-means Algorithm for Big Data Cluster Analysis
    Hu, Haize
    Liu, Jianxun
    Zhang, Xiangping
    Fang, Mengge
    PATTERN RECOGNITION, 2023, 139
  • [28] A new variable weighting and selection procedure for K-means cluster analysis
    Steinley, Douglas
    Brusco, Michael J.
    MULTIVARIATE BEHAVIORAL RESEARCH, 2008, 43 (01) : 77 - 108
  • [29] A Novel Genetic Algorithm Based k-means Algorithm for Cluster Analysis
    El-Shorbagy, M. A.
    Ayoub, A. Y.
    El-Desoky, I. M.
    Mousa, A. A.
    INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 : 92 - 101
  • [30] K-means based cluster analysis of residential smart meter measurements
    Al-Wakeel, Ali
    Wu, Jianzhong
    CUE 2015 - APPLIED ENERGY SYMPOSIUM AND SUMMIT 2015: LOW CARBON CITIES AND URBAN ENERGY SYSTEMS, 2016, 88 : 754 - 760