Fast Density Clustering Algorithm for Numerical Data and Categorical Data

被引:9
|
作者
Chen Jinyin [1 ]
He Huihao [1 ]
Chen Jungan [2 ]
Yu Shanqing [1 ]
Shi Zhaoxia [1 ]
机构
[1] Zhejiang Univ Technol, Hangzhou 310023, Zhejiang, Peoples R China
[2] Ningbo Wanli Univ, Dept Elect Engn, Ningbo 310023, Zhejiang, Peoples R China
基金
浙江省自然科学基金; 中国国家自然科学基金;
关键词
MIXED DATA;
D O I
10.1155/2017/6393652
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Data objects with mixed numerical and categorical attributes are often dealt with in the real world. Most existing algorithms have limitations such as low clustering quality, cluster center determination difficulty, and initial parameter sensibility. A fast density clustering algorithm (FDCA) is put forward based on one-time scan with cluster centers automatically determined by center set algorithm (CSA). A novel data similarity metric is designed for clustering data including numerical attributes and categorical attributes. CSA is designed to choose cluster centers from data object automatically which overcome the cluster centers setting difficulty in most clustering algorithms. The performance of the proposed method is verified through a series of experiments on ten mixed data sets in comparison with several other clustering algorithms in terms of the clustering purity, the efficiency, and the time complexity.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] SCLOPE: An algorithm for clustering data streams of categorical attributes
    Ong, KL
    Li, WY
    Ng, WK
    Lim, EP
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2004, 3181 : 209 - 218
  • [22] A k-populations algorithm for clustering categorical data
    Kim, DW
    Lee, K
    Lee, D
    Lee, KH
    PATTERN RECOGNITION, 2005, 38 (07) : 1131 - 1134
  • [23] Fuzzy Clustering Ensemble Algorithm for Partitioning Categorical Data
    Li, Taoying
    Chen, Yan
    2009 INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING, PROCEEDINGS, 2009, : 170 - 174
  • [24] Performances of parallel clustering algorithm for categorical and mixed data
    Hai, NTM
    Susumu, H
    PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 : 252 - 256
  • [25] Reduction Through Homogeneous Clustering: Variations for Categorical Data and Fast Data Reduction
    Ougiaroglou S.
    Papadimitriou N.
    Evangelidis G.
    SN Computer Science, 5 (6)
  • [26] Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams
    Fahy, Conor
    Yang, Shengxiang
    Gongora, Mario
    IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (06) : 2215 - 2228
  • [27] On data labeling for clustering categorical data
    Chen, Hung-Leng
    Chuang, Kun-Ta
    Chen, Ming-Syan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1458 - 1471
  • [28] A Fast Clustering Algorithm for Massive Data
    He Q.
    Li S.-F.
    Huang H.
    Xu H.
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2020, 43 (03): : 118 - 124
  • [29] A study on a fuzzy clustering for mixed numerical and categorical incomplete data
    Furukawa, Takashi
    Ohnishi, Shin-ichi
    Yamanoi, Takahiro
    2013 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY 2013), 2013, : 425 - 428
  • [30] Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters
    Jia, Hong
    Cheung, Yiu-Ming
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) : 3308 - 3325