Rough set approach for clustering categorical data using information-theoretic dependency measure

被引:30
|
作者
Park, In-Kyoo [1 ]
Choi, Gyoo-Seok [2 ]
机构
[1] Joongbu Univ, Dept Comp Sci, Kurnsan Gun 312702, Chungnam, South Korea
[2] Chungwoon Univ, Dept Comp Sci, Inchon 402803, South Korea
关键词
Clustering; Categorical data; Rough set theory; Information system; Attribute dependency; K-MEANS ALGORITHM;
D O I
10.1016/j.is.2014.06.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A variety of clustering algorithms exists to group objects having similar characteristics. But the implementations of many of those algorithms are challenging in the process of dealing with categorical data. While some of the algorithms cannot handle categorical data, others are unable to handle uncertainty within categorical data in nature. This is prerequisite for clustering categorical data which also deal with uncertainty. An algorithm, termed minimum-minimum roughness (MMR) was proposed, which uses the rough set theory in order to deal with the above problems in clustering categorical data. Later many algorithms has developed to improve the handling of hybrid data. This research proposes information-theoretic dependency roughness (ITDR), another technique for categorical data clustering taking into account information-theoretic attributes dependencies degree of categorical-valued information systems. In addition, it is second to none of all its predecessors; MMR, MMeR, SDR and standard-deviation of standard-deviation roughness (SSDR). Experimental results on two benchmark UCI datasets show that ITDR technique is better with the baseline categorical data clustering technique with respect to computational complexity and the purity of clusters. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:289 / 295
页数:7
相关论文
共 50 条
  • [1] Rough set based information theoretic approach for clustering uncertain categorical data
    Uddin, Jamal
    Ghazali, Rozaida
    Abawajy, Jemal H.
    Shah, Habib
    Husaini, Noor Aida
    Zeb, Asim
    [J]. PLOS ONE, 2022, 17 (05):
  • [2] Rough Set Approach for Categorical Data Clustering
    Herawan, Tutut
    Yanto, Iwan Tri Riyadi
    Deris, Mustafa Mat
    [J]. DATABASE THEORY AND APPLICATION, 2009, 64 : 179 - 186
  • [3] An information-theoretic approach to hierarchical clustering of uncertain data
    Gullo, Francesco
    Ponti, Giovanni
    Tagarelli, Andrea
    Greco, Sergio
    [J]. INFORMATION SCIENCES, 2017, 402 : 199 - 215
  • [4] A rough set theoretic approach to clustering
    De, SK
    [J]. FUNDAMENTA INFORMATICAE, 2004, 62 (3-4) : 409 - 417
  • [5] Information-theoretic measures associated with rough set approximations
    Zhu, Ping
    Wen, Qiaoyan
    [J]. INFORMATION SCIENCES, 2012, 212 : 33 - 43
  • [6] A Hybrid Approach to Classification of Categorical Data Based on Information-Theoretic Context Selection
    Alamuri, Madhavi
    Surampudi, Bapi Raju
    Negi, Atul
    [J]. PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON FUZZY AND NEURO COMPUTING (FANCCO - 2015), 2015, 415 : 285 - 295
  • [7] An Information-Theoretic Measure of Dependency Among Variables in Large Datasets
    Mousavi, Ali
    Baraniuk, Richard G.
    [J]. 2015 53RD ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2015, : 650 - 657
  • [8] Information-theoretic measure of uncertainty in generalized fuzzy rough sets
    Mi, Ju-Sheng
    Li, Xiu-Min
    Zhao, Hui-Yin
    Feng, Tao
    [J]. ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2007, 4482 : 63 - +
  • [9] Information-theoretic clustering: A representative and evolutionary approach
    Araujo, Daniel
    Doria Neto, Adriao
    Martins, Allan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (10) : 4190 - 4205
  • [10] A Hierarchical Algorithm for Clustering Uncertain Data via an Information-Theoretic Approach
    Gullo, Francesco
    Ponti, Giovanni
    Tagarelli, Andrea
    Greco, Sergio
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 821 - 826