MD-SPKM: A set pair k-modes clustering algorithm for incomplete categorical matrix data

被引:3
|
作者
Zhang, Chunying [1 ,3 ]
Gao, Ruiyan [1 ]
Wang, Jiahao [1 ]
Chen, Song [1 ]
Liu, Fengchun [2 ]
Ren, Jing [1 ]
Feng, Xiaoze [1 ]
机构
[1] North China Univ Sci & Technol, Coll Sci, Tangshan, Hebei, Peoples R China
[2] North China Univ Sci & Technol, Qianan Coll, Tangshan, Hebei, Peoples R China
[3] Key Lab Data Sci & Applicat Hebei Prov, Tangshan, Hebei, Peoples R China
关键词
Incomplete categorical matrix data; set pair information granule; k-modes; set pair distance; set pair k-modes;
D O I
10.3233/IDA-205340
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to solve the clustering problem with incomplete and categorical matrix data sets, and considering the uncertain relationship between samples and clusters, a set pair k-modes clustering algorithm is proposed (MD-SPKM). Firstly, the correlation theory of set pair information granule is introduced into k-modes clustering. By improving the distance formula of traditional k-modes algorithm, a set pair distance measurement method between incomplete matrix samples is defined. Secondly, considering the uncertain relationship between the sample and the cluster, the definition of the intra-cluster average distance and the threshold calculation formula to determine whether the sample belongs to multiple clusters is given, and then the result of set pair clustering is formed, which includes positive region, boundary region and negative region. Finally, through the selected three data sets and four contrast algorithms for experimental evaluation, the experimental results show that the set pair k-modes clustering algorithm can effectively handle incomplete categorical matrix data sets, and has good clustering performance in Accuracy, Recall, ARI and NMI.
引用
收藏
页码:1507 / 1524
页数:18
相关论文
共 50 条
  • [1] A fuzzy k-modes algorithm for clustering categorical data
    Huang, ZX
    Ng, MK
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1999, 7 (04) : 446 - 452
  • [2] A Global K-modes Algorithm for Clustering Categorical Data
    Bai Tian
    Kulikowski, C. A.
    Gong Leiguang
    Yang Bin
    Huang Lan
    Zhou Chunguang
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2012, 21 (03): : 460 - 465
  • [3] A genetic k-modes algorithm for clustering categorical data
    Gan, GJ
    Yang, ZJ
    Wu, JH
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 195 - 202
  • [4] A weighting k-modes algorithm for subspace clustering of categorical data
    Cao, Fuyuan
    Liang, Jiye
    Li, Deyu
    Zhao, Xingwang
    [J]. NEUROCOMPUTING, 2013, 108 : 23 - 30
  • [5] A genetic fuzzy k-Modes algorithm for clustering categorical data
    Gan, G.
    Wu, J.
    Yang, Z.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 1615 - 1620
  • [6] Initialization of K-Modes Clustering for Categorical Data
    Li Tao-ying
    Chen Yan
    Jin Zhi-hong
    Li Ye
    [J]. 2013 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING (ICMSE), 2013, : 107 - 112
  • [7] An efficient k-modes algorithm for clustering categorical datasets
    Dorman, Karin S.
    Maitra, Ranjan
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2022, 15 (01) : 83 - 97
  • [8] A MD fuzzy k-modes Algorithm for Clustering Categorical Matrix-Object Data; [基于分类型矩阵对象数据的MD fuzzy k-modes聚类算法]
    Li S.
    Zhang M.
    Cao F.
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (06): : 1325 - 1337
  • [9] Clustering categorical data: Soft rounding k-modes
    Gavva, Surya Teja
    Karthik, C. S.
    Punna, Sharath
    [J]. INFORMATION AND COMPUTATION, 2024, 296
  • [10] Clustering of Categorical Data Using Intuitionistic Fuzzy k-modes
    Mehta, Darshan
    Tripathy, B. K.
    [J]. PROCEEDINGS OF SIXTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2016), VOL 1, 2017, 546 : 254 - 263