Data Labeling method based on Cluster Purity using Relative Rough Entropy for Categorical Data Clustering

被引:0
|
作者
Reddy, H. Venkateswara [1 ]
Raju, S. Viswanadha [2 ]
Agrawal, Pratibha [3 ]
机构
[1] Vardhaman Coll Engn, Dept Comp Sci & Engn, Hyderabad, Andhra Pradesh, India
[2] JNTUH Coll Engn, Dept Comp Sci & Engn, Nachupally, India
[3] Univ Delhi, Dept Comp Sci & Engn, New Delhi, India
来源
2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI) | 2013年
关键词
categorical Data; Clustering; Data Labeling; Outlier; Entropy; Rough set; Cluster Purity;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering is an important technique in data mining. Clustering a large data set is difficult and time consuming. An approach called data labeling has been suggested for clustering large databases using sampling technique to improve efficiency of clustering. A sampled data is selected randomly for initial clustering and data points which are not sampled and unclustered are given cluster label or an outlier based on various data labeling techniques. Data labeling is an easy task in numerical domain because it is performed based on distance between a cluster and an unlabeled data point. However, in categorical domain since the distance is not defined properly between data points and between data point with cluster, then data labeling is a difficult task for categorical data. In this paper, we have proposed a method for data labeling using Relative Rough Entropy for clustering categorical data. The concept of entropy, introduced by Shannon with particular reference to information theory is a powerful mechanism for the measurement of uncertainty information. In this method, data labeling is performed by integrating entropy with rough sets. In this paper, the cluster purity is also used for outlier detection. The experimental results show that the efficiency and clustering quality of this algorithm are better than the previous algorithms.
引用
收藏
页码:500 / 506
页数:7
相关论文
共 50 条
  • [31] Hierarchical clustering algorithm for categorical data using a probabilistic rough set model
    Li, Min
    Deng, Shaobo
    Wang, Lei
    Feng, Shengzhong
    Fan, Jianping
    KNOWLEDGE-BASED SYSTEMS, 2014, 65 : 60 - 71
  • [32] A Link-Based Cluster Ensemble Approach for Categorical Data Clustering
    Iam-On, Natthakan
    Boongoen, Tossapon
    Garrett, Simon
    Price, Chris
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (03) : 413 - 425
  • [33] Rough set based information theoretic approach for clustering uncertain categorical data
    Uddin, Jamal
    Ghazali, Rozaida
    Abawajy, Jemal H.
    Shah, Habib
    Husaini, Noor Aida
    Zeb, Asim
    PLOS ONE, 2022, 17 (05):
  • [34] Categorical Data Clustering with Automatic Selection of Cluster Number
    Liao, Hai-Yong
    Ng, Michael K.
    FUZZY INFORMATION AND ENGINEERING, 2009, 1 (01) : 5 - 25
  • [35] Rough set approach for categorical data clustering1
    Herawan, Tutut
    Ghazali, Rozaida
    Yanto, Iwan Tri Riyadi
    Deris, Mustafa Mat
    International Journal of Database Theory and Application, 2010, 3 (01): : 33 - 52
  • [36] Incremental entropy-based clustering on categorical data streams with concept drift
    Li, Yanhong
    Li, Deyu
    Wang, Suge
    Zhai, Yanhui
    KNOWLEDGE-BASED SYSTEMS, 2014, 59 : 33 - 47
  • [37] Clustering Categorical Data Based on Representatives
    Aranganayagi, S.
    Thangavel, K.
    THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 599 - +
  • [38] Efficiency Based Categorical Data Clustering
    Kalaivani, K.
    Raghavendra, A. P. V.
    2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2012, : 550 - 553
  • [39] EDMD: An Entropy based Dissimilarity measure to cluster Mixed-categorical Data
    Kar, Amit Kumar
    Akhter, Mohammad Maksood
    Mishra, Amaresh Chandra
    Mohanty, Sraban Kumar
    PATTERN RECOGNITION, 2024, 155
  • [40] An New Algorithm-based Rough Set for Selecting Clustering Attribute in Categorical Data
    Baroud, Muftah Mohamed Jomah
    Hashim, Siti Zaiton Mohd
    Zainal, Anazida
    Ahnad, Jamilah
    2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 1358 - 1364