An Initialization Method for Clustering Mixed Numeric and Categorical Data Based on the Density and Distance

被引:16
|
作者
Ji, Jinchao [1 ,2 ]
Pang, Wei [3 ,4 ]
Zheng, Yanlin [1 ]
Wang, Zhe [2 ,5 ]
Ma, Zhiqiang [1 ]
机构
[1] NE Normal Univ, Sch Comp Sci & Informat Technol, Changchun 130117, Peoples R China
[2] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[3] Univ Aberdeen, Sch Nat & Comp Sci, Aberdeen AB24 3UE, Scotland
[4] Nanjing Univ Informat Sci & Technol, Sch Comp & Software, Nanjing 210044, Jiangsu, Peoples R China
[5] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Clustering; data mining; mixed numeric and categorical data; cluster center initialization; ALGORITHM;
D O I
10.1142/S021800141550024X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of the initialization approaches are dedicated to the partitional clustering algorithms which process categorical or numerical data only. However, in real-world applications, data objects with both numeric and categorical features are ubiquitous. The coexistence of both categorical and numerical attributes make the initialization methods designed for single-type data inapplicable to mixed-type data. Furthermore, to the best of our knowledge, in the existing partitional clustering algorithms designed for mixed-type data, the initial cluster centers are determined randomly. In this paper, we propose a novel initialization method for mixed data clustering. In the proposed method, both the distance and density are exploited together to determine initial cluster centers. The performance of the proposed method is demonstrated by a series of experiments on three real-world datasets in comparison with that of traditional initialization methods.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data
    Ji, Jinchao
    Pang, Wei
    Zhou, Chunguang
    Han, Xiao
    Wang, Zhe
    [J]. KNOWLEDGE-BASED SYSTEMS, 2012, 30 : 129 - 135
  • [22] Clustering categorical data based on distance vectors
    Zhang, P
    Wang, XG
    Song, PXK
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) : 355 - 367
  • [23] Fuzzy K-prototypes algorithm for clustering mixed numeric and categorical valued data
    Chen, Ning
    Chen, An
    Zhou, Long-Xiang
    [J]. Ruan Jian Xue Bao/Journal of Software, 2001, 12 (08): : 1107 - 1119
  • [24] An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets
    Zhang, Kang
    Gu, Xingsheng
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [25] Initialization of K-Modes Clustering for Categorical Data
    Li Tao-ying
    Chen Yan
    Jin Zhi-hong
    Li Ye
    [J]. 2013 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING (ICMSE), 2013, : 107 - 112
  • [26] A Weight Entropy k-means Algorithm for Clustering Dataset with Mixed Numeric and Categorical Data
    Li, Taoying
    Chen, Yan
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 1, PROCEEDINGS, 2008, : 36 - 41
  • [27] Clustering based on compressed data for categorical and mixed attributes
    Rendon, Erendira
    Sanchez, Jose Salvador
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2006, 4109 : 817 - 825
  • [28] Topological Machine Learning for Mixed Numeric and Categorical Data
    Wu, Chengyuan
    Hargreaves, Carol Anne
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2021, 30 (05)
  • [29] Context-Based Distance Learning for Categorical Data Clustering
    Ienco, Dino
    Pensa, Ruggero G.
    Meo, Rosa
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS VIII, PROCEEDINGS, 2009, 5772 : 83 - 94
  • [30] A Two-Step Method for Clustering Mixed Categroical and Numeric Data
    Shih, Ming-Yi
    Jheng, Jar-Wen
    Lai, Lien-Fu
    [J]. JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2010, 13 (01): : 11 - 19