Data Labeling method based on Cluster Purity using Relative Rough Entropy for Categorical Data Clustering

被引：0

作者：

Reddy, H. Venkateswara ^{[1
]}

Raju, S. Viswanadha ^{[2
]}

Agrawal, Pratibha ^{[3
]}

机构：

[1] Vardhaman Coll Engn, Dept Comp Sci & Engn, Hyderabad, Andhra Pradesh, India

[2] JNTUH Coll Engn, Dept Comp Sci & Engn, Nachupally, India

[3] Univ Delhi, Dept Comp Sci & Engn, New Delhi, India

来源：

2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI) | 2013年

关键词：

categorical Data; Clustering; Data Labeling; Outlier; Entropy; Rough set; Cluster Purity;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Clustering is an important technique in data mining. Clustering a large data set is difficult and time consuming. An approach called data labeling has been suggested for clustering large databases using sampling technique to improve efficiency of clustering. A sampled data is selected randomly for initial clustering and data points which are not sampled and unclustered are given cluster label or an outlier based on various data labeling techniques. Data labeling is an easy task in numerical domain because it is performed based on distance between a cluster and an unlabeled data point. However, in categorical domain since the distance is not defined properly between data points and between data point with cluster, then data labeling is a difficult task for categorical data. In this paper, we have proposed a method for data labeling using Relative Rough Entropy for clustering categorical data. The concept of entropy, introduced by Shannon with particular reference to information theory is a powerful mechanism for the measurement of uncertainty information. In this method, data labeling is performed by integrating entropy with rough sets. In this paper, the cluster purity is also used for outlier detection. The experimental results show that the efficiency and clustering quality of this algorithm are better than the previous algorithms.

引用

页码：500 / 506

页数：7

共 50 条

[21] An efficient entropy based dissimilarity measure to cluster categorical data
Kar, Amit Kumar
Mishra, Amaresh Chandra
Mohanty, Sraban Kumar
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 119
[22] MMR: An algorithm for clustering categorical data using Rough Set Theory
Parmar, Darshit
Wu, Teresa
Blackhurst, Jennifer
DATA & KNOWLEDGE ENGINEERING, 2007, 63 (03) : 879 - 893
[23] Data Reduction Method for Categorical Data Clustering
Rendon, Erendira
Salvador Sanchez, J.
Garcia, Rene A.
Abundez, Itzel
Gutierrez, Citlalih
Gasca, Eduardo
ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2008, PROCEEDINGS, 2008, 5290 : 143 - +
[24] Generalized Entropy and Projection Clustering of Categorical Data
Simovici, Dan A.
Cristofor, Dana
Critofor, Laurentiu
LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 619 - 625
[25] Clustering Categorical Data:A Cluster Ensemble Approach
何增友
High Technology Letters, 2003, (04) : 8 - 12
[26] Detecting outliers in categorical data through rough clustering
Suri, N. N. R. Ranga
Murty, M. Narasimha
Athithan, G.
NATURAL COMPUTING, 2016, 15 (03) : 385 - 394
[27] Integrated Rough Fuzzy Clustering for Categorical data Analysis
Saha, Indrajit
Sarkar, Jnanendra Prasad
Maulik, Ujjwal
FUZZY SETS AND SYSTEMS, 2019, 361 : 1 - 32
[28] Detecting outliers in categorical data through rough clustering
N. N. R. Ranga Suri
M. Narasimha Murty
G. Athithan
Natural Computing, 2016, 15 : 385 - 394
[29] Entropy based clustering of data streams with mixed numeric and categorical values
Wang, Shuyun
Fan, Yingjie
Zhang, Chenghong
Xu, HeXiang
Hao, Xiulan
Hu, Yunfa
7TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE IN CONJUNCTION WITH 2ND IEEE/ACIS INTERNATIONAL WORKSHOP ON E-ACTIVITY, PROCEEDINGS, 2008, : 140 - +
[30] A Clustering Method for Categorical Ordinal Data
Giordan, Marco
Diana, Giancarlo
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2011, 40 (07) : 1315 - 1334

← 1 2 3 4 5 →