Clustering based on compressed data for categorical and mixed attributes

被引:0
|
作者
Rendon, Erendira
Sanchez, Jose Salvador
机构
[1] Inst Tecnol Toluca, Lab Reconocimiento Patrones, Metepec, Mexico
[2] Univ Jaume 1, Dept Llenguatges & Sistemes Informat, E-12071 Castellon de La Plana, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering in data mining is a discovery process that groups a set of data so as to maximize the intra-cluster similarity and to minimize the intercluster similarity. Clustering becomes more challenging when data are categorical and the amount of available memory is less than the size of the data set. In this paper, we introduce CBC (Clustering Based on Compressed Data), an extension of the Birch algorithm whose main characteristics refer to the fact that it can be especially suitable for very large databases and it can work both with categorical attributes and mixed features. Effectiveness and performance of the CBC procedure were compared with those of the well-known K-modes clustering algorithm, demonstrating that the CBC summary process does not affect the final clustering, while execution times can be drastically lessened.
引用
收藏
页码:817 / 825
页数:9
相关论文
共 50 条
  • [1] Algorithm for fuzzy clustering of mixed data with numeric and categorical attributes
    Ahmad, A
    Dey, L
    [J]. DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2005, 3816 : 561 - 572
  • [2] Summarizing categorical data by clustering attributes
    Mampaey, Michael
    Vreeken, Jilles
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (01) : 130 - 173
  • [3] Summarizing categorical data by clustering attributes
    Michael Mampaey
    Jilles Vreeken
    [J]. Data Mining and Knowledge Discovery, 2013, 26 : 130 - 173
  • [4] Imputing Missing Values for Mixed Numeric and Categorical Attributes Based on Incomplete Data Hierarchical Clustering
    Feng, Xiaodong
    Wu, Sen
    Liu, Yanchi
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2011, 7091 : 414 - 424
  • [5] Clustering feature vectors with mixed numerical and categorical attributes
    Brouwer R.K.
    [J]. International Journal of Computational Intelligence Systems, 2008, 1 (4) : 285 - 298
  • [6] Clustering feature vectors with mixed numerical and categorical attributes
    Brouwer, Roelof K.
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2008, 1 (04) : 285 - 298
  • [7] SCLOPE: An algorithm for clustering data streams of categorical attributes
    Ong, KL
    Li, WY
    Ng, WK
    Lim, EP
    [J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2004, 3181 : 209 - 218
  • [8] A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
    Ohn Mar San
    Van-Nam Huynh
    Yoshiteru Nakamori
    [J]. Journal of Systems Science & Complexity, 2003, (04) : 562 - 571
  • [9] Entropy based clustering of data streams with mixed numeric and categorical values
    Wang, Shuyun
    Fan, Yingjie
    Zhang, Chenghong
    Xu, HeXiang
    Hao, Xiulan
    Hu, Yunfa
    [J]. 7TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE IN CONJUNCTION WITH 2ND IEEE/ACIS INTERNATIONAL WORKSHOP ON E-ACTIVITY, PROCEEDINGS, 2008, : 140 - +
  • [10] Simplex Based Vector Mapping for Categorical Attributes Clustering
    An, Ning
    Jiang, Siyuan
    Yang, Jiaoyun
    Li, Lian
    [J]. 2018 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS (CIIS 2018), 2018, : 56 - 60