An Initialization Method for Clustering Mixed Numeric and Categorical Data Based on the Density and Distance

被引:16
|
作者
Ji, Jinchao [1 ,2 ]
Pang, Wei [3 ,4 ]
Zheng, Yanlin [1 ]
Wang, Zhe [2 ,5 ]
Ma, Zhiqiang [1 ]
机构
[1] NE Normal Univ, Sch Comp Sci & Informat Technol, Changchun 130117, Peoples R China
[2] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[3] Univ Aberdeen, Sch Nat & Comp Sci, Aberdeen AB24 3UE, Scotland
[4] Nanjing Univ Informat Sci & Technol, Sch Comp & Software, Nanjing 210044, Jiangsu, Peoples R China
[5] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Clustering; data mining; mixed numeric and categorical data; cluster center initialization; ALGORITHM;
D O I
10.1142/S021800141550024X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of the initialization approaches are dedicated to the partitional clustering algorithms which process categorical or numerical data only. However, in real-world applications, data objects with both numeric and categorical features are ubiquitous. The coexistence of both categorical and numerical attributes make the initialization methods designed for single-type data inapplicable to mixed-type data. Furthermore, to the best of our knowledge, in the existing partitional clustering algorithms designed for mixed-type data, the initial cluster centers are determined randomly. In this paper, we propose a novel initialization method for mixed data clustering. In the proposed method, both the distance and density are exploited together to determine initial cluster centers. The performance of the proposed method is demonstrated by a series of experiments on three real-world datasets in comparison with that of traditional initialization methods.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
    Ohn Mar San
    Van-Nam Huynh
    Yoshiteru Nakamori
    [J]. Journal of Systems Science & Complexity, 2003, (04) : 562 - 571
  • [2] Entropy based clustering of data streams with mixed numeric and categorical values
    Wang, Shuyun
    Fan, Yingjie
    Zhang, Chenghong
    Xu, HeXiang
    Hao, Xiulan
    Hu, Yunfa
    [J]. 7TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE IN CONJUNCTION WITH 2ND IEEE/ACIS INTERNATIONAL WORKSHOP ON E-ACTIVITY, PROCEEDINGS, 2008, : 140 - +
  • [3] Clustering Mixed Numeric and Categorical Data With Cuckoo Search
    Ji, Jinchao
    Pang, Wei
    Li, Zairong
    He, Fei
    Feng, Guozhong
    Zhao, Xiaowei
    [J]. IEEE ACCESS, 2020, 8 : 30988 - 31003
  • [4] A new initialization method for clustering categorical data
    Wu, Shu
    Jiang, Qingshan
    Huang, Joshua Zhexue
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 972 - +
  • [5] A new initialization method for categorical data clustering
    Cao, Fuyuan
    Liang, Jiye
    Bai, Liang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10223 - 10228
  • [6] Algorithm for fuzzy clustering of mixed data with numeric and categorical attributes
    Ahmad, A
    Dey, L
    [J]. DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2005, 3816 : 561 - 572
  • [7] A SURVEY ON CLUSTERING METHODS FOR NUMERIC, CATEGORICAL AND MIXED VARIABLES DATA
    Nisha
    Hooda, B. K.
    [J]. INTERNATIONAL JOURNAL OF AGRICULTURAL AND STATISTICAL SCIENCES, 2022, 18 (02): : 675 - 679
  • [8] A cluster centers initialization method for clustering categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    Cao, Fuyuan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (09) : 8022 - 8029
  • [9] A Support Based Initialization Algorithm for Categorical Data Clustering
    Kumar, Ajay
    Kumar, Shishir
    [J]. JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2018, 11 (02) : 53 - 67
  • [10] A Multi-View Clustering Algorithm for Mixed Numeric and Categorical Data
    Ji, Jinchao
    Li, Ruonan
    Pang, Wei
    He, Fei
    Feng, Guozhong
    Zhao, Xiaowei
    [J]. IEEE ACCESS, 2021, 9 : 24913 - 24924