Integrated Rough Fuzzy Clustering for Categorical data Analysis

被引:25
|
作者
Saha, Indrajit [1 ]
Sarkar, Jnanendra Prasad [2 ,3 ]
Maulik, Ujjwal [3 ]
机构
[1] Natl Inst Tech Teachers Training & Res, Dept Comp Sci & Engn, Kolkata 700106, India
[2] Vodafone India Ltd, Pune 411006, Maharashtra, India
[3] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata 700032, India
关键词
Categorical data; Cluster validity indices; Rough Fuzzy Clustering; Simulated Annealing; Genetic Algorithm; Random Forest; Sensitivity analysis; Statistical test; DATA SETS; ALGORITHM; EXTENSIONS;
D O I
10.1016/j.fss.2018.02.007
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In recent times, advanced data mining research has been mostly focusing on clustering of categorical data, where a natural ordering in attribute values is missing. To address this fact the Rough Fuzzy K-Modes clustering technique has been recently developed in order to handle imperfect information, i.e. indiscernibility (coarseness) and vagueness within the dataset. However, it has been observed that the said technique suffers from the problem of local optima due to the random choice of initial cluster modes. Hence, in this paper, we have proposed an integrated clustering technique using multi-phase learning. In this regard, first, Simulated Annealing based Rough Fuzzy K-Modes and Genetic Algorithm based Rough Fuzzy K-Modes are proposed in order to perform the clustering better by considering clustering as an underlying optimization problem. These clustering methods individually produce clusters having set of central and peripheral points. Thereafter, for each case, final improved clustering results are obtained by assigning peripheral points to a particular crisp cluster using Random Forest, where central points are used as training set. Second, the varying cardinality of the training and testing sets produced by each clustering method further motivated us to propose a generalized technique called Integrated Rough Fuzzy Clustering using Random Forest, where, results of three aforementioned clustering techniques are used to compute the roughness measure. Based on this measure, three different sets namely best central points, semi-best central points and pure peripheral points are determined. Thereafter, using multi-phase learning, best central points are used to classify the semi-best central points and then using both of them, pure peripheral points are classified by Random Forest. Experimental results are reported quantitatively and visually to demonstrate the effectiveness of the proposed methods in comparison with well-known state-of-the-art methods for six synthetic and five real-life datasets. Finally, statistical significance tests are conducted to establish the superiority of the results produced by the proposed methods. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 32
页数:32
相关论文
共 50 条
  • [1] Fuzzy rough clustering for categorical data
    Xu, Shuliang
    Liu, Shenglan
    Zhou, Jian
    Feng, Lin
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (11) : 3213 - 3223
  • [2] Fuzzy rough clustering for categorical data
    Shuliang Xu
    Shenglan Liu
    Jian Zhou
    Lin Feng
    [J]. International Journal of Machine Learning and Cybernetics, 2019, 10 : 3213 - 3223
  • [3] Ensemble based rough fuzzy clustering for categorical data
    Saha, Indrajit
    Sarkar, Jnanendra Prasad
    Maulik, Ujjwal
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 77 : 114 - 127
  • [4] A Comparative Analysis of Rough Intuitionistic Fuzzy K-Mode Algorithm for Clustering Categorical Data
    Tripathy, B. K.
    Goyal, Akarsh
    Sourav, Patra Anupam
    [J]. RESEARCH JOURNAL OF PHARMACEUTICAL BIOLOGICAL AND CHEMICAL SCIENCES, 2016, 7 (05): : 2787 - 2802
  • [5] Rough Set Approach for Categorical Data Clustering
    Herawan, Tutut
    Yanto, Iwan Tri Riyadi
    Deris, Mustafa Mat
    [J]. DATABASE THEORY AND APPLICATION, 2009, 64 : 179 - 186
  • [6] Formulations of fuzzy clustering for categorical data
    Umayahara, Kazutaka
    Miyamoto, Sadaaki
    Nakamori, Yoshiteru
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2005, 1 (01): : 83 - 94
  • [7] Fuzzy clustering for categorical multivariate data
    Oh, CH
    Honda, K
    Ichihashi, H
    [J]. JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 2154 - 2159
  • [8] Categorical data fuzzy clustering: An analysis of local search heuristics
    Benati, Stefano
    [J]. COMPUTERS & OPERATIONS RESEARCH, 2008, 35 (03) : 766 - 775
  • [9] Fuzzy Rough Attribute Reduction for Categorical Data
    Wang, Changzhong
    Wang, Yan
    Shao, Mingwen
    Qian, Yuhua
    Chen, Degang
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (05) : 818 - 830
  • [10] Fuzzy clustering of categorical data using fuzzy centroids
    Kim, DW
    Lee, KH
    Lee, D
    [J]. PATTERN RECOGNITION LETTERS, 2004, 25 (11) : 1263 - 1271