Data Labeling method based on Cluster Purity using Relative Rough Entropy for Categorical Data Clustering

被引:0
|
作者
Reddy, H. Venkateswara [1 ]
Raju, S. Viswanadha [2 ]
Agrawal, Pratibha [3 ]
机构
[1] Vardhaman Coll Engn, Dept Comp Sci & Engn, Hyderabad, Andhra Pradesh, India
[2] JNTUH Coll Engn, Dept Comp Sci & Engn, Nachupally, India
[3] Univ Delhi, Dept Comp Sci & Engn, New Delhi, India
关键词
categorical Data; Clustering; Data Labeling; Outlier; Entropy; Rough set; Cluster Purity;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering is an important technique in data mining. Clustering a large data set is difficult and time consuming. An approach called data labeling has been suggested for clustering large databases using sampling technique to improve efficiency of clustering. A sampled data is selected randomly for initial clustering and data points which are not sampled and unclustered are given cluster label or an outlier based on various data labeling techniques. Data labeling is an easy task in numerical domain because it is performed based on distance between a cluster and an unlabeled data point. However, in categorical domain since the distance is not defined properly between data points and between data point with cluster, then data labeling is a difficult task for categorical data. In this paper, we have proposed a method for data labeling using Relative Rough Entropy for clustering categorical data. The concept of entropy, introduced by Shannon with particular reference to information theory is a powerful mechanism for the measurement of uncertainty information. In this method, data labeling is performed by integrating entropy with rough sets. In this paper, the cluster purity is also used for outlier detection. The experimental results show that the efficiency and clustering quality of this algorithm are better than the previous algorithms.
引用
收藏
页码:500 / 506
页数:7
相关论文
共 50 条
  • [1] Data Labeling method based on Rough Entropy for Categorical Data Clustering
    Sreenivasulu, G.
    Raju, S. Viswanadha
    Rao, N. Sambasiva
    2014 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATION AND COMPUTATIONAL ENGINEERING (ICECCE), 2014, : 173 - 178
  • [2] A Data Labeling method for Categorical Data Clustering using Cluster Entropies in Rough Sets
    Reddy, H. Venkateswara
    Kumar, B. Suresh
    Raju, S. Viswanadha
    2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 444 - 449
  • [3] An Efficient Approach for Clustering US Census Data Based on Cluster Similarity Using Rough Entropy on Categorical Data
    Sreenivasulu, G.
    Raju, S. Viswanadha
    Rao, N. Sambasiva
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR COMPETITIVE STRATEGIES, 2019, 40 : 359 - 375
  • [4] A Roughset Based Data Labeling Method for Clustering Categorical Data
    Reddy, H. Venkateswara
    Raju, S. Viswanadha
    2014 3RD INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS (ICECCS 2014), 2014, : 51 - 55
  • [5] A data labeling method for clustering categorical data
    Cao, Fuyuan
    Liang, Jiye
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 2381 - 2385
  • [6] On data labeling for clustering categorical data
    Chen, Hung-Leng
    Chuang, Kun-Ta
    Chen, Ming-Syan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1458 - 1471
  • [7] Clustering Categorical Data Using Rough Membership Function
    Kumar, B. Suresh
    Reddy, H. Venkateswara
    Raju, T. Ankamma
    Vennam, Preethi
    2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 602 - 607
  • [8] Ensemble based rough fuzzy clustering for categorical data
    Saha, Indrajit
    Sarkar, Jnanendra Prasad
    Maulik, Ujjwal
    KNOWLEDGE-BASED SYSTEMS, 2015, 77 : 114 - 127
  • [9] Fuzzy rough clustering for categorical data
    Xu, Shuliang
    Liu, Shenglan
    Zhou, Jian
    Feng, Lin
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (11) : 3213 - 3223
  • [10] Fuzzy rough clustering for categorical data
    Shuliang Xu
    Shenglan Liu
    Jian Zhou
    Lin Feng
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 3213 - 3223