A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA

被引:0
|
作者
Ohn Mar San
Van-Nam Huynh
Yoshiteru Nakamori
机构
[1] 1-1 Asahidai
[2] 923-1292
[3] Ishikawa
[4] Japan
[5] Japan Advanced Institute of Science and Technology
[6] School of Knowledge Science
[7] Tatsunokuchi
关键词
Cluster analysis; numeric data; categorical data; k-means algorithm;
D O I
暂无
中图分类号
O241 [数值分析];
学科分类号
070102 ;
摘要
Most of the earlier work on clustering mainly focused on numeric data whoseinherent geometric properties can be exploited to naturally define distance functions be-tween data points. However, data mining applications frequently involve many datasetsthat also consists of mixed numeric and categorical attributes. In this paper we presenta clustering algorithm which is based on the κ-means algorithm. The algorithm clustersobjects with numeric and categorical attributes in a way similar to κ-means. The objectsimilarity measure is derived from both numeric and categorical attributes. When appliedto numeric data, the algorithm is identical to the κ-means. The main result of this paperis to provide a method to update the "cluster centers" of clustering objects described bymixed numeric and categorical attributes in the clustering process to minimize the cluster-ing cost function. The clustering performance of the algorithm is demonstrated with thetwo well known data sets, namely credit approval and abalone
引用
收藏
页码:562 / 571
页数:10
相关论文
共 50 条
  • [1] Algorithm for fuzzy clustering of mixed data with numeric and categorical attributes
    Ahmad, A
    Dey, L
    [J]. DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2005, 3816 : 561 - 572
  • [2] A Multi-View Clustering Algorithm for Mixed Numeric and Categorical Data
    Ji, Jinchao
    Li, Ruonan
    Pang, Wei
    He, Fei
    Feng, Guozhong
    Zhao, Xiaowei
    [J]. IEEE ACCESS, 2021, 9 : 24913 - 24924
  • [3] Clustering algorithm for incomplete data sets with mixed numeric and categorical attributes
    Sen, Wu
    Hong, Chen
    Xiaodong, Feng
    [J]. International Journal of Database Theory and Application, 2013, 6 (05): : 95 - 104
  • [4] A k-mean clustering algorithm for mixed numeric and categorical data
    Ahmad, Amir
    Dey, Lipika
    [J]. DATA & KNOWLEDGE ENGINEERING, 2007, 63 (02) : 503 - 527
  • [5] An improved k-prototypes clustering algorithm for mixed numeric and categorical data
    Ji, Jinchao
    Bai, Tian
    Zhou, Chunguang
    Ma, Chao
    Wang, Zhe
    [J]. NEUROCOMPUTING, 2013, 120 : 590 - 596
  • [6] Optimization of the Numeric and Categorical Attribute Weights in KAMILA Mixed Data Clustering Algorithm
    Martarelli, Nadia Junqueira
    Nagano, Marcelo Seido
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2019, PT I, 2019, 11871 : 20 - 27
  • [7] A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data
    Ji, Jinchao
    Pang, Wei
    Zhou, Chunguang
    Han, Xiao
    Wang, Zhe
    [J]. KNOWLEDGE-BASED SYSTEMS, 2012, 30 : 129 - 135
  • [8] Clustering Mixed Numeric and Categorical Data With Cuckoo Search
    Ji, Jinchao
    Pang, Wei
    Li, Zairong
    He, Fei
    Feng, Guozhong
    Zhao, Xiaowei
    [J]. IEEE ACCESS, 2020, 8 : 30988 - 31003
  • [9] An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets
    Zhang, Kang
    Gu, Xingsheng
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [10] Fuzzy K-prototypes algorithm for clustering mixed numeric and categorical valued data
    Chen, Ning
    Chen, An
    Zhou, Long-Xiang
    [J]. Ruan Jian Xue Bao/Journal of Software, 2001, 12 (08): : 1107 - 1119