Data Labeling method based on Cluster Purity using Relative Rough Entropy for Categorical Data Clustering

被引：0

作者：

Reddy, H. Venkateswara ^{[1
]}

Raju, S. Viswanadha ^{[2
]}

Agrawal, Pratibha ^{[3
]}

机构：

[1] Vardhaman Coll Engn, Dept Comp Sci & Engn, Hyderabad, Andhra Pradesh, India

[2] JNTUH Coll Engn, Dept Comp Sci & Engn, Nachupally, India

[3] Univ Delhi, Dept Comp Sci & Engn, New Delhi, India

来源：

2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI) | 2013年

关键词：

categorical Data; Clustering; Data Labeling; Outlier; Entropy; Rough set; Cluster Purity;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Clustering is an important technique in data mining. Clustering a large data set is difficult and time consuming. An approach called data labeling has been suggested for clustering large databases using sampling technique to improve efficiency of clustering. A sampled data is selected randomly for initial clustering and data points which are not sampled and unclustered are given cluster label or an outlier based on various data labeling techniques. Data labeling is an easy task in numerical domain because it is performed based on distance between a cluster and an unlabeled data point. However, in categorical domain since the distance is not defined properly between data points and between data point with cluster, then data labeling is a difficult task for categorical data. In this paper, we have proposed a method for data labeling using Relative Rough Entropy for clustering categorical data. The concept of entropy, introduced by Shannon with particular reference to information theory is a powerful mechanism for the measurement of uncertainty information. In this method, data labeling is performed by integrating entropy with rough sets. In this paper, the cluster purity is also used for outlier detection. The experimental results show that the efficiency and clustering quality of this algorithm are better than the previous algorithms.

引用

页码：500 / 506

页数：7

共 50 条

[41] MIGR: A Categorical Data Clustering Algorithm Based on Information Gain in Rough Set Theory
Raheem, Saddam
Al Shehabi, Shadi
Nassief, Amaal Mohi
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2022, 30 (05) : 757 - 771
[42] A SCALABLE CLUSTERING METHOD FOR CATEGORICAL SEQUENCE DATA
Oh, Seung-Joon
Kim, Jae-Yearn
INTERNATIONAL JOURNAL OF COMPUTATIONAL METHODS, 2005, 2 (02) : 167 - 180
[43] A new initialization method for clustering categorical data
Wu, Shu
Jiang, Qingshan
Huang, Joshua Zhexue
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 972 - +
[44] A new initialization method for categorical data clustering
Cao, Fuyuan
Liang, Jiye
Bai, Liang
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10223 - 10228
[45] Incremental Clustering for Categorical Data Using Clustering Ensemble
Li Taoying
Chne Yan
Qu Lili
Mu Xiangwei
PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 2519 - 2524
[46] Performance Analysis of Various Entropy Measures in Categorical Data Clustering
Sharma, Shachi
Pemo, Sonam
2020 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2020), 2020, : 592 - 595
[47] Mining categorical sequences from data using a hybrid clustering method
De Angelis, Luca
Dias, Jose G.
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2014, 234 (03) : 720 - 730
[48] Data spread-based entropy clustering method using adaptive learning
Cheng, Ching-Hsue
Wei, Liang-Ying
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (10) : 12357 - 12361
[49] Clustering Categorical Data Using Hierarchies (CLUCDUH)
Silahtaroglu, Gökhan
World Academy of Science, Engineering and Technology, 2009, 56 : 334 - 339
[50] Clustering categorical data using coverage density
Yan, H
Zhang, L
Zhang, Y
ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 248 - 255

← 1 2 3 4 5 →