Graph Enhanced Fuzzy Clustering for Categorical Data Using a Bayesian Dissimilarity Measure

被引：12

作者：

Zhang, Chuanbin ^{[1
,2
]}

Chen, Long ^{[1
]}

Zhao, Yin-Ping ^{[3
]}

Wang, Yingxu ^{[1
]}

Chen, C. L. Philip ^{[4
]}

机构：

[1] Univ Macau, Fac Sci & Technol, Dept Comp & Informat Sci, Macau 999078, Peoples R China

[2] Zhaoqing Univ, Sch Comp Sci & Software, Zhaoqing 526061, Peoples R China

[3] Northwestern Polytech Univ, Sch Software, Xian 710072, Peoples R China

[4] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510641, Peoples R China

来源：

IEEE TRANSACTIONS ON FUZZY SYSTEMS | 2023年 / 31卷 / 03期

关键词：

Clustering algorithms; Probabilistic logic; Bayes methods; Manifolds; Linear programming; Kernel; Estimation; Bayesian methods; categorical data; fuzzy centroids; fuzzy clustering; graph; Kullback-Leibler (K-L) divergence; K-MEANS; ALGORITHM; INFORMATION; SIMILARITY; DISTANCE; MODEL;

D O I：

10.1109/TFUZZ.2022.3189831

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Categorical data are widely available in many real-world applications, and to discover valuable patterns in such data by clustering is of great importance. However, the lack of a decent quantitative relationship among categorical values makes traditional clustering approaches, which are usually developed for numerical data, perform poorly on categorical datasets. To solve this problem and boost the performance of clustering for categorical data, we propose a novel fuzzy clustering model in this article. At first, by approximating the maximum a posteriori (MAP) estimation of a discrete distribution of data partition, a new fuzzy clustering objective function is designed for categorical data. The Bayesian dissimilarity measure is formulated in this objective to tackle the subtle relationships between categorical values efficiently. Then, to further enhance the performance of clustering, a novel Kullback-Leibler divergence-based graph regularization is integrated into the clustering objective to exploit the prior knowledge on datasets, for example, the information about correlations of data points. The proposed model is solved by the alternative optimization and the experimental results on the synthetic and real-world datasets show that it outperforms the classical and relevant state-of-the-art algorithms. We also present the parameter analysis of our approach, and conduct a comprehensive study on the effectiveness of the Bayesian dissimilarity measure and the KL divergence-based graph regularization.

引用

页码：810 / 824

页数：15

共 50 条

[1] Context-Based Geodesic Dissimilarity Measure for Clustering Categorical Data
Lee, Changki
Jung, Uk
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (18):
[2] An effective dissimilarity measure for clustering of high-dimensional categorical data
Lee, Jeonghoon
Lee, Yoon-Joon
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 38 (03) : 743 - 757
[3] An effective dissimilarity measure for clustering of high-dimensional categorical data
Jeonghoon Lee
Yoon-Joon Lee
[J]. Knowledge and Information Systems, 2014, 38 : 743 - 757
[4] Fuzzy clustering of categorical data using fuzzy centroids
Kim, DW
Lee, KH
Lee, D
[J]. PATTERN RECOGNITION LETTERS, 2004, 25 (11) : 1263 - 1271
[5] Clustering Categorical Data Using an Extended Modularity Measure
Labiod, Lazhar
Grozavu, Nistor
Bennani, Younes
[J]. NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 : 310 - 320
[6] A Comparative Analysis of Dissimilarity Measures for Clustering Categorical Data
Xavierr-Junior, Joao C.
Canuto, Anne M. P.
Almeida, Noriedson D.
Goncalves, Luiz M. G.
[J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
[7] Learning-Based Dissimilarity for Clustering Categorical Data
Rivera Rios, Edgar Jacob
Angel Medina-Perez, Miguel
Lazo-Cortes, Manuel S.
Monroy, Raul
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (08):
[8] Clustering Categorical Data via Ensembling Dissimilarity Matrices
Amiri, Saeid
Clarke, Bertrand S.
Clarke, Jennifer L.
[J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2018, 27 (01) : 195 - 208
[9] An association-based dissimilarity measure for categorical data
Le, SQ
Ho, TB
[J]. PATTERN RECOGNITION LETTERS, 2005, 26 (16) : 2549 - 2557
[10] A fuzzy relational clustering algorithm based on a dissimilarity measure extracted from data
Corsini, P
Lazzerini, B
Marcelloni, F
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (01): : 775 - 782

← 1 2 3 4 5 →