Data Reduction Method for Categorical Data Clustering

被引:0
|
作者
Rendon, Erendira [1 ]
Salvador Sanchez, J. [2 ]
Garcia, Rene A. [1 ]
Abundez, Itzel [1 ]
Gutierrez, Citlalih [1 ]
Gasca, Eduardo [1 ]
机构
[1] Inst Tecnol Toluca, Lab Reconocimiento Patrones, Av Tecnol S-N, Metepec 52140, Mexico
[2] Univ Jaume 1, Dept lenguatges & Sist Informt, E-12071 Castell de la Plana, Spain
关键词
Categorical Attributes; K-modes Clustering Algorithm; Reduced database;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Categorical data clustering constitutes an important part of data mining; its relevance has recently drawn attention from several researchers. As a step in data mining, however, clustering encounters the problem of large amount of data to be processed. This article offers a solution for categorical clustering algorithms when working with high volumes of data by means of a method that summarizes the database. This is done using a structure called CM-tree. In order to test our method, the K-Modes and Click clustering algorithms were used with several databases. Experiments demonstrate that the proposed summarization method improves execution time, without losing clustering quality.
引用
收藏
页码:143 / +
页数:2
相关论文
共 50 条
  • [1] A data labeling method for clustering categorical data
    Cao, Fuyuan
    Liang, Jiye
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 2381 - 2385
  • [2] A Clustering Method for Categorical Ordinal Data
    Giordan, Marco
    Diana, Giancarlo
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2011, 40 (07) : 1315 - 1334
  • [3] On Joint Dimension Reduction and Clustering of Categorical Data
    D'Enza, Alfonso Iodice
    Van de Velden, Michel
    Palumbo, Francesco
    [J]. ANALYSIS AND MODELING OF COMPLEX DATA IN BEHAVIORAL AND SOCIAL SCIENCES, 2014, : 161 - 169
  • [4] Reduction Through Homogeneous Clustering: Variations for Categorical Data and Fast Data Reduction
    Stefanos Ougiaroglou
    Nikolaos Papadimitriou
    Georgios Evangelidis
    [J]. SN Computer Science, 5 (6)
  • [5] A Roughset Based Data Labeling Method for Clustering Categorical Data
    Reddy, H. Venkateswara
    Raju, S. Viswanadha
    [J]. 2014 3RD INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS (ICECCS 2014), 2014, : 51 - 55
  • [6] A SCALABLE CLUSTERING METHOD FOR CATEGORICAL SEQUENCE DATA
    Oh, Seung-Joon
    Kim, Jae-Yearn
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL METHODS, 2005, 2 (02) : 167 - 180
  • [7] A new initialization method for clustering categorical data
    Wu, Shu
    Jiang, Qingshan
    Huang, Joshua Zhexue
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 972 - +
  • [8] A new initialization method for categorical data clustering
    Cao, Fuyuan
    Liang, Jiye
    Bai, Liang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10223 - 10228
  • [9] On data labeling for clustering categorical data
    Chen, Hung-Leng
    Chuang, Kun-Ta
    Chen, Ming-Syan
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1458 - 1471
  • [10] Data Labeling method based on Rough Entropy for Categorical Data Clustering
    Sreenivasulu, G.
    Raju, S. Viswanadha
    Rao, N. Sambasiva
    [J]. 2014 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATION AND COMPUTATIONAL ENGINEERING (ICECCE), 2014, : 173 - 178