Graph Enhanced Fuzzy Clustering for Categorical Data Using a Bayesian Dissimilarity Measure

被引:12
|
作者
Zhang, Chuanbin [1 ,2 ]
Chen, Long [1 ]
Zhao, Yin-Ping [3 ]
Wang, Yingxu [1 ]
Chen, C. L. Philip [4 ]
机构
[1] Univ Macau, Fac Sci & Technol, Dept Comp & Informat Sci, Macau 999078, Peoples R China
[2] Zhaoqing Univ, Sch Comp Sci & Software, Zhaoqing 526061, Peoples R China
[3] Northwestern Polytech Univ, Sch Software, Xian 710072, Peoples R China
[4] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510641, Peoples R China
关键词
Clustering algorithms; Probabilistic logic; Bayes methods; Manifolds; Linear programming; Kernel; Estimation; Bayesian methods; categorical data; fuzzy centroids; fuzzy clustering; graph; Kullback-Leibler (K-L) divergence; K-MEANS; ALGORITHM; INFORMATION; SIMILARITY; DISTANCE; MODEL;
D O I
10.1109/TFUZZ.2022.3189831
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Categorical data are widely available in many real-world applications, and to discover valuable patterns in such data by clustering is of great importance. However, the lack of a decent quantitative relationship among categorical values makes traditional clustering approaches, which are usually developed for numerical data, perform poorly on categorical datasets. To solve this problem and boost the performance of clustering for categorical data, we propose a novel fuzzy clustering model in this article. At first, by approximating the maximum a posteriori (MAP) estimation of a discrete distribution of data partition, a new fuzzy clustering objective function is designed for categorical data. The Bayesian dissimilarity measure is formulated in this objective to tackle the subtle relationships between categorical values efficiently. Then, to further enhance the performance of clustering, a novel Kullback-Leibler divergence-based graph regularization is integrated into the clustering objective to exploit the prior knowledge on datasets, for example, the information about correlations of data points. The proposed model is solved by the alternative optimization and the experimental results on the synthetic and real-world datasets show that it outperforms the classical and relevant state-of-the-art algorithms. We also present the parameter analysis of our approach, and conduct a comprehensive study on the effectiveness of the Bayesian dissimilarity measure and the KL divergence-based graph regularization.
引用
收藏
页码:810 / 824
页数:15
相关论文
共 50 条
  • [41] A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional
    Chatzis, Sotirios P.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (07) : 8684 - 8689
  • [42] Incremental Clustering for Categorical Data Using Clustering Ensemble
    Li Taoying
    Chne Yan
    Qu Lili
    Mu Xiangwei
    [J]. PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 2519 - 2524
  • [43] An expressive dissimilarity measure for relational clustering using neighbourhood trees
    Sebastijan Dumančić
    Hendrik Blockeel
    [J]. Machine Learning, 2017, 106 : 1523 - 1545
  • [44] An expressive dissimilarity measure for relational clustering using neighbourhood trees
    Dumancic, Sebastijan
    Blockeel, Hendrik
    [J]. MACHINE LEARNING, 2017, 106 (9-10) : 1523 - 1545
  • [45] Goodman-Kruskal measure associated clustering for categorical data
    Huang, Wenxue
    Pan, Yuanyi
    Wu, Jianhong
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2012, 4 (04) : 334 - 360
  • [46] On Fuzzy Clustering for Incomplete Spherical Data and for Incomplete Multivariate Categorical Data
    Kanzawa, Yuchi
    [J]. 2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 638 - 643
  • [47] Fuzzy Clustering based on α-Divergence for Spherical Data and for Categorical Multivariate Data
    Kanzawa, Yuchi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
  • [48] Dissimilarity Based Principal Component Analysis Using Fuzzy Clustering
    Sato-Ilic, Mika
    [J]. INTEGRATED UNCERTAINTY MANAGEMENT AND APPLICATIONS, 2010, 68 : 453 - 464
  • [49] A fuzzy k-modes algorithm for clustering categorical data
    Huang, ZX
    Ng, MK
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1999, 7 (04) : 446 - 452
  • [50] A study on a fuzzy clustering for mixed numerical and categorical incomplete data
    Furukawa, Takashi
    Ohnishi, Shin-ichi
    Yamanoi, Takahiro
    [J]. 2013 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY 2013), 2013, : 425 - 428