A novel density peaks clustering algorithm for mixed data

被引:46
|
作者
Du, Mingjing [1 ]
Ding, Shifei [1 ,2 ]
Xue, Yu [3 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Comp & Software, Nanjing 210044, Jiangsu, Peoples R China
关键词
Data clustering; Density peaks; Entropy; Mixed data; SIMILARITY;
D O I
10.1016/j.patrec.2017.07.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The density peaks clustering (DPC) algorithm is well known for its power on non-spherical distribution data sets. However, it works only on numerical values. This prohibits it from being used to cluster real world data containing categorical values and numerical values. Traditional clustering algorithms for mixed data use a pre-processing based on binary encoding. But such methods destruct the original structure of categorical attributes. Other solutions based on simple matching, such as K-Prototypes, need a userdefined parameter to avoid favoring either type of attribute. In order to overcome these problems, we present a novel clustering algorithm for mixed data, called DPC-MD. We improve DPC by using a new similarity criterion to deal with the three types of data: numerical, categorical, or mixed data. Compared to other methods for mixed data, DPC absolutely has more advantages to deal with non-spherical distribution data. In addition, the core of the proposed method is based on a new similarity measure for mixed data. This similarity measure is proposed to avoid feature transformation and parameter adjustment. The performance of our method is demonstrated by experiments on some real-world datasets in comparison with that of traditional clustering algorithms, such as K-Modes, K-Prototypes EKP and SBAC. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:46 / 53
页数:8
相关论文
共 50 条
  • [21] An Ensemble Learning Algorithm Based on Density Peaks Clustering and Fitness for Imbalanced Data
    Xu, Hui
    Liu, Qicheng
    IEEE ACCESS, 2022, 10 : 116120 - 116128
  • [22] A spectral clustering algorithm based on attribute fluctuation and density peaks clustering algorithm
    Xin Song
    Shuhua Li
    Ziqiang Qi
    Jianlin Zhu
    Applied Intelligence, 2023, 53 : 10520 - 10534
  • [23] A spectral clustering algorithm based on attribute fluctuation and density peaks clustering algorithm
    Song, Xin
    Li, Shuhua
    Qi, Ziqiang
    Zhu, Jianlin
    APPLIED INTELLIGENCE, 2023, 53 (09) : 10520 - 10534
  • [24] An Improved Density Peaks Clustering Algorithm Based On Density Ratio
    Zou, Yujuan
    Wang, Zhijian
    Xu, Pengfei
    Lv, Taizhi
    COMPUTER JOURNAL, 2024, 67 (07): : 2515 - 2528
  • [25] An improved density peaks method for data clustering
    Lotfi, Abdulrahman
    Seyedi, Seyed Amjad
    Moradi, Parham
    2016 6TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2016, : 263 - 268
  • [26] A study on Two-Stage Mixed Attribute Data Clustering Based on Density Peaks
    Liu, Shihua
    Zhang, Hao
    Liu, Xianghua
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2021, 18 (05) : 634 - 643
  • [27] A Clustering Algorithm for Binary Protocol Data Frames Based on Principal Component Analysis and Density Peaks Clustering
    Yan, Xiaoyong
    Li, Qing
    Tao, Siyu
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1260 - 1266
  • [28] An Algorithm of Clustering by Density Peaks Using in Anomaly Detection
    Yin, Chunyong
    Zhang, Sun
    Yin, Zhichao
    Wang, Jin
    INTERNATIONAL JOURNAL OF SECURITY AND ITS APPLICATIONS, 2015, 9 (12): : 115 - 127
  • [29] Coflow scheduling algorithm based density peaks clustering
    Li, Chenghao
    Zhang, Huyin
    Zhou, Tianying
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 97 : 805 - 813
  • [30] Cosine kernel based density peaks clustering algorithm
    Wang, Jiayuan
    Lv, Li
    Wu, Runxiu
    Fan, Tanghuai
    Lee, Ivan
    INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2020, 12 (01) : 1 - 20