A Roughset Based Data Labeling Method for Clustering Categorical Data

被引:0
|
作者
Reddy, H. Venkateswara [1 ]
Raju, S. Viswanadha [2 ]
机构
[1] Vardhaman Coll Engn, Dept Comp Sci & Engn, Hyderabad, Andhra Pradesh, India
[2] JNTUH Coll Engn, Dept Comp Sci & Engn, Nachupally, Karimnagar, India
关键词
Clustering; Data labeling; Categorical Data; Outlier; Rough Sets; Entropy;
D O I
10.1109/Eco-friendly.2014.86
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data mining presets the process of finding analytical accounts in huge databases. Clustering is a one of efficient technique in data mining and it is performed based on the principle of similarity. Clustering the large database is a demanding and time consuming task. For this reason, an approach called data labeling through sampling technique is used. Data labeling is process of clustering the un sampled data objects into appropriate clusters. In this approach clustering the data is easy and also it improves the efficiency of clustering. In this method initially a sample dataset is chosen from a large database for clustering when initial clustering is completed, and the unsampled data objects are compared with the presented clusters. As a result, the similar data objects are given proper clustered labels and the dissimilar ones are treated as outliers. These data labeling methods are easier to execute on the numerical data, but it is complicated task for the categorical data because the distance among data objects does not exist. In the proposed method, a new and efficient data labeling technique is used to cluster the categorical data based on the cluster entropy in rough set theory. It is shown through the experimental results that the proposed algorithm is efficient and produces high quality clusters than previous clustering methods.
引用
收藏
页码:51 / 55
页数:5
相关论文
共 50 条
  • [31] CLUSTERING CATEGORICAL DATA BASED ON COMBINATIONS OF ATTRIBUTE VALUES
    Do, Hee-Jung
    Kim, Jae Yearn
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (12A): : 4393 - 4405
  • [32] Clustering categorical data based on maximal frequent itemsets
    Yu, Dadong
    Liu, Dongbo
    Luo, Rui
    Wang, Jianxin
    [J]. ICMLA 2007: SIXTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2007, : 93 - +
  • [33] Model-Based Hierarchical Clustering for Categorical Data
    Alalyan, Fahdah
    Zamzami, Nuha
    Bouguila, Nizar
    [J]. 2019 IEEE 28TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2019, : 1424 - 1429
  • [34] Clustering based on compressed data for categorical and mixed attributes
    Rendon, Erendira
    Sanchez, Jose Salvador
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2006, 4109 : 817 - 825
  • [35] A method for k-means-like clustering of categorical data
    Nguyen T.-H.T.
    Dinh D.-T.
    Sriboonchitta S.
    Huynh V.-N.
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (11) : 15011 - 15021
  • [36] An LSH-based k-representatives clustering method for large categorical data
    Mau, Toan Nguyen
    Huynh, Van-Nam
    [J]. NEUROCOMPUTING, 2021, 463 : 29 - 44
  • [37] Ordering of categorical data in hierarchical clustering
    Kazimianec, Michail
    [J]. DATABASES AND INFORMATION SYSTEMS, 2008, : 401 - 404
  • [38] Formulations of fuzzy clustering for categorical data
    Umayahara, Kazutaka
    Miyamoto, Sadaaki
    Nakamori, Yoshiteru
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2005, 1 (01): : 83 - 94
  • [39] Fuzzy rough clustering for categorical data
    Xu, Shuliang
    Liu, Shenglan
    Zhou, Jian
    Feng, Lin
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (11) : 3213 - 3223
  • [40] Space Structure and Clustering of Categorical Data
    Qian, Yuhua
    Li, Feijiang
    Liang, Jiye
    Liu, Bing
    Dang, Chuangyin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (10) : 2047 - 2059