A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures

被引:6
|
作者
Kim, Kyoungok [1 ]
机构
[1] Seoul Natl Univ Sci & Technol SeoulTech, Int Fus Sch, Informat Technol Management Programme, 232 Gongreungno, Seoul 139743, South Korea
关键词
k-modes clustering; fuzzy k-modes clustering; weighted k-modes clustering; fuzzy weighted k -modes clustering; ALGORITHM; SIMILARITY;
D O I
10.3233/JIFS-16157
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Partitioning a set of objects into groups or clusters is a fundamental task in data mining, and clustering is a popular approach to implementing partitioning. Among several clustering algorithms, the k-means algorithm is well-known and widely applied in several areas that only handle numerical attributes. The k-modes algorithm is an extension of the k-means algorithm that deals with categorical variables, which has several variations such as fuzzy methods. This paper presents a new attribute weighting method for the k-modes algorithm that utilizes impurity measures such as entropy and Gini impurity. The proposed algorithm considers both the distribution of categories of attributes within the same cluster and between different clusters. By doing this, categorical variables defined as more important that others by the new algorithm have a significant influence on the similarity calculation, and this results in improved clustering performance, which was confirmed by experiments.
引用
收藏
页码:979 / 990
页数:12
相关论文
共 11 条
  • [1] Adaptive soft subspace clustering combining within-cluster and between-cluster information
    Jin, Liying
    Zhao, Shengdun
    Zhang, Congcong
    Gao, Wei
    Dou, Yao
    Lu, Mengkang
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (03) : 3319 - 3330
  • [2] The k-modes type clustering plus between-cluster information for categorical data
    Bai, Liang
    Liang, Jiye
    [J]. NEUROCOMPUTING, 2014, 133 : 111 - 121
  • [3] Enhanced soft subspace clustering integrating within-cluster and between-cluster information
    Deng, Zhaohong
    Choi, Kup-Sze
    Chung, Fu-Lai
    Wang, Shitong
    [J]. PATTERN RECOGNITION, 2010, 43 (03) : 767 - 781
  • [4] CONVERSATION CLUSTERING BASED ON PLCA USING WITHIN-CLUSTER SPARSITY CONSTRAINTS
    Kawaguchi, Yohei
    Togami, Masahito
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 619 - 623
  • [5] An Improved K-modes Clustering Algorithm Based on Intra-cluster and Inter-cluster Dissimilarity Measure
    Zhou, Hongfang
    Zhang, Yihui
    Liu, Yibin
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE & APPLICATION TECHNOLOGY (ICCIA 2017), 2017, 74 : 410 - 418
  • [6] A new weighted fuzzy C-means clustering approach considering between-cluster separability
    Wu, Ziheng
    Li, Cong
    Zhou, Fang
    Liu, Lei
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (01) : 1017 - 1024
  • [7] An Entropy Regularization k-Means Algorithm with a New Measure of between-Cluster Distance in Subspace Clustering
    Xiong, Liyan
    Wang, Cheng
    Huang, Xiaohui
    Zeng, Hui
    [J]. ENTROPY, 2019, 21 (07)
  • [8] Analysis of recurrent gap time data using the weighted risk-set method and the modified within-cluster resampling method
    Luo, Xianghua
    Huang, Chiung-Yu
    [J]. STATISTICS IN MEDICINE, 2011, 30 (04) : 301 - 311
  • [9] Comment on "Enhanced soft subspace clustering integrating within-cluster and between-cluster information" by Z. Deng et al. (Pattern Recognition, vol. 43, pp. 767-781, 2010)
    Forghani, Yahya
    [J]. PATTERN RECOGNITION, 2018, 77 : 456 - 457
  • [10] A New Method to Determine Cluster Number Without Clustering for Every K Based on Ratio of Variance to Range in K-Means
    Ri, Yong Ae
    Kang, Chol Ryong
    Kim, Kuk Hyon
    Choe, Yong Myong
    Han, Un Chol
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022