A Variable Selection Procedure for K-Means Clustering

被引:1
|
作者
Kim, Sung-Soo [1 ]
机构
[1] Korea Natl Open Univ, Dept Informat Stat, Seoul 110791, South Korea
关键词
K -means clustering; variable selection; Mojena's stopping rule; VS-KM; HINoV; adjusted Rand index;
D O I
10.5351/KJAS.2012.25.3.471
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
One of the most important problems in cluster analysis is the selection of variables that truly define cluster structure, while eliminating noisy variables that mask such structure. Brusco and Cradit (2001) present VS-KM(variable-selection heuristic for K-means clustering) procedure for selecting true variables for K -means clustering based on adjusted Rand index. This procedure starts with the fixed number of clusters in K -means and adds variables sequentially based on an adjusted Rand index. This paper presents an updated procedure combining the VS-KM with the automated K -means procedure provided by Kim (2009). This automated variable selection procedure for K -means clustering calculates the cluster number and initial cluster center whenever new variable is added and adds a variable based on adjusted Rand index. Simulation result indicates that the proposed procedure is very effective at selecting true variables and at eliminating noisy variables. Implemented program using R can be obtained on the website "http://faculty.knou.ac.kr/sskim/nvarkm.r and vnvarkm.r".
引用
收藏
页码:471 / 483
页数:13
相关论文
共 50 条