Integrated Rough Fuzzy Clustering for Categorical data Analysis

被引:25
|
作者
Saha, Indrajit [1 ]
Sarkar, Jnanendra Prasad [2 ,3 ]
Maulik, Ujjwal [3 ]
机构
[1] Natl Inst Tech Teachers Training & Res, Dept Comp Sci & Engn, Kolkata 700106, India
[2] Vodafone India Ltd, Pune 411006, Maharashtra, India
[3] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata 700032, India
关键词
Categorical data; Cluster validity indices; Rough Fuzzy Clustering; Simulated Annealing; Genetic Algorithm; Random Forest; Sensitivity analysis; Statistical test; DATA SETS; ALGORITHM; EXTENSIONS;
D O I
10.1016/j.fss.2018.02.007
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In recent times, advanced data mining research has been mostly focusing on clustering of categorical data, where a natural ordering in attribute values is missing. To address this fact the Rough Fuzzy K-Modes clustering technique has been recently developed in order to handle imperfect information, i.e. indiscernibility (coarseness) and vagueness within the dataset. However, it has been observed that the said technique suffers from the problem of local optima due to the random choice of initial cluster modes. Hence, in this paper, we have proposed an integrated clustering technique using multi-phase learning. In this regard, first, Simulated Annealing based Rough Fuzzy K-Modes and Genetic Algorithm based Rough Fuzzy K-Modes are proposed in order to perform the clustering better by considering clustering as an underlying optimization problem. These clustering methods individually produce clusters having set of central and peripheral points. Thereafter, for each case, final improved clustering results are obtained by assigning peripheral points to a particular crisp cluster using Random Forest, where central points are used as training set. Second, the varying cardinality of the training and testing sets produced by each clustering method further motivated us to propose a generalized technique called Integrated Rough Fuzzy Clustering using Random Forest, where, results of three aforementioned clustering techniques are used to compute the roughness measure. Based on this measure, three different sets namely best central points, semi-best central points and pure peripheral points are determined. Thereafter, using multi-phase learning, best central points are used to classify the semi-best central points and then using both of them, pure peripheral points are classified by Random Forest. Experimental results are reported quantitatively and visually to demonstrate the effectiveness of the proposed methods in comparison with well-known state-of-the-art methods for six synthetic and five real-life datasets. Finally, statistical significance tests are conducted to establish the superiority of the results produced by the proposed methods. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 32
页数:32
相关论文
共 50 条
  • [41] Categorical Data Clustering: A Bibliometric Analysis and Taxonomy
    Cendana, Maya
    Kuo, Ren-Jieh
    [J]. MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (02): : 1009 - 1054
  • [42] Rough set based information theoretic approach for clustering uncertain categorical data
    Uddin, Jamal
    Ghazali, Rozaida
    Abawajy, Jemal H.
    Shah, Habib
    Husaini, Noor Aida
    Zeb, Asim
    [J]. PLOS ONE, 2022, 17 (05):
  • [43] Hierarchical clustering algorithm for categorical data using a probabilistic rough set model
    Li, Min
    Deng, Shaobo
    Wang, Lei
    Feng, Shengzhong
    Fan, Jianping
    [J]. KNOWLEDGE-BASED SYSTEMS, 2014, 65 : 60 - 71
  • [44] On Fuzzy Clustering for Categorical Multivariate Data Induced by Polya Mixture Models
    Kanzawa, Yuchi
    [J]. MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE (MDAI 2017), 2017, 10571 : 89 - 102
  • [45] A genetic fuzzy k-Modes algorithm for clustering categorical data
    Gan, G.
    Wu, J.
    Yang, Z.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 1615 - 1620
  • [46] Fuzzy Clustering of Categorical Attributes and its Use in Analyzing Cultural Data
    Tsekouras, George E.
    Papageorgiou, Dimitris
    Kotsiantis, Sotiris
    Kalloniatis, Christos
    Pintelas, Panagiotis
    [J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 1, 2007, 1 : 87 - +
  • [47] Classification of web documents using fuzzy logic categorical data clustering
    Tsekouras, George E.
    Anagnostopoulos, Christos
    Gavalas, Damianos
    Dafri, Economou
    [J]. ARTIFICIAL INTELLIGENCE AND INNOVATIONS 2007: FROM THEORY TO APPLICATIONS, 2007, : 93 - +
  • [48] Clustering observations using fuzzy similarities between ordered categorical data
    Ninomiya, T
    [J]. INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOL 1-4, PROCEEDINGS, 2005, : 3216 - 3220
  • [49] Clustering of Categorical Data Using Intuitionistic Fuzzy k-modes
    Mehta, Darshan
    Tripathy, B. K.
    [J]. PROCEEDINGS OF SIXTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2016), VOL 1, 2017, 546 : 254 - 263
  • [50] Many-objective fuzzy centroids clustering algorithm for categorical data
    Zhu, Shuwei
    Xu, Lihong
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 96 : 230 - 248