A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures

被引：6

作者：

Kim, Kyoungok ^{[1
]}

机构：

[1] Seoul Natl Univ Sci & Technol SeoulTech, Int Fus Sch, Informat Technol Management Programme, 232 Gongreungno, Seoul 139743, South Korea

来源：

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS | 2017年 / 32卷 / 01期

关键词：

k-modes clustering; fuzzy k-modes clustering; weighted k-modes clustering; fuzzy weighted k -modes clustering; ALGORITHM; SIMILARITY;

D O I：

10.3233/JIFS-16157

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Partitioning a set of objects into groups or clusters is a fundamental task in data mining, and clustering is a popular approach to implementing partitioning. Among several clustering algorithms, the k-means algorithm is well-known and widely applied in several areas that only handle numerical attributes. The k-modes algorithm is an extension of the k-means algorithm that deals with categorical variables, which has several variations such as fuzzy methods. This paper presents a new attribute weighting method for the k-modes algorithm that utilizes impurity measures such as entropy and Gini impurity. The proposed algorithm considers both the distribution of categories of attributes within the same cluster and between different clusters. By doing this, categorical variables defined as more important that others by the new algorithm have a significant influence on the similarity calculation, and this results in improved clustering performance, which was confirmed by experiments.

引用

页码：979 / 990

页数：12

共 11 条

[1] Adaptive soft subspace clustering combining within-cluster and between-cluster information
Jin, Liying
Zhao, Shengdun
Zhang, Congcong
Gao, Wei
Dou, Yao
Lu, Mengkang
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (03) : 3319 - 3330
[2] The k-modes type clustering plus between-cluster information for categorical data
Bai, Liang
Liang, Jiye
[J]. NEUROCOMPUTING, 2014, 133 : 111 - 121
[3] Enhanced soft subspace clustering integrating within-cluster and between-cluster information
Deng, Zhaohong
Choi, Kup-Sze
Chung, Fu-Lai
Wang, Shitong
[J]. PATTERN RECOGNITION, 2010, 43 (03) : 767 - 781
[4] CONVERSATION CLUSTERING BASED ON PLCA USING WITHIN-CLUSTER SPARSITY CONSTRAINTS
Kawaguchi, Yohei
Togami, Masahito
[J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 619 - 623
[5] An Improved K-modes Clustering Algorithm Based on Intra-cluster and Inter-cluster Dissimilarity Measure
Zhou, Hongfang
Zhang, Yihui
Liu, Yibin
[J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE & APPLICATION TECHNOLOGY (ICCIA 2017), 2017, 74 : 410 - 418
[6] A new weighted fuzzy C-means clustering approach considering between-cluster separability
Wu, Ziheng
Li, Cong
Zhou, Fang
Liu, Lei
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (01) : 1017 - 1024
[7] An Entropy Regularization k-Means Algorithm with a New Measure of between-Cluster Distance in Subspace Clustering
Xiong, Liyan
Wang, Cheng
Huang, Xiaohui
Zeng, Hui
[J]. ENTROPY, 2019, 21 (07)
[8] Analysis of recurrent gap time data using the weighted risk-set method and the modified within-cluster resampling method
Luo, Xianghua
Huang, Chiung-Yu
[J]. STATISTICS IN MEDICINE, 2011, 30 (04) : 301 - 311
[9] Comment on "Enhanced soft subspace clustering integrating within-cluster and between-cluster information" by Z. Deng et al. (Pattern Recognition, vol. 43, pp. 767-781, 2010)
Forghani, Yahya
[J]. PATTERN RECOGNITION, 2018, 77 : 456 - 457
[10] A New Method to Determine Cluster Number Without Clustering for Every K Based on Ratio of Variance to Range in K-Means
Ri, Yong Ae
Kang, Chol Ryong
Kim, Kuk Hyon
Choe, Yong Myong
Han, Un Chol
[J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022

← 1 2 →