Overcoming weaknesses of density peak clustering using a data-dependent similarity measure

被引:8
|
作者
Rasool, Zafaryab [1 ]
Aryal, Sunil [1 ]
Bouadjenek, Mohamed Reda [1 ]
Dazeley, Richard [1 ]
机构
[1] Deakin Univ, Sch Informat Technol, Waurn Ponds Campus, Geelong, Vic 3216, Australia
关键词
Clustering; Density peak clustering; Similarity measure; Data-dependent similarity;
D O I
10.1016/j.patcog.2022.109287
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Density Peak Clustering (DPC) is a popular state-of-the-art clustering algorithm, which requires pairwise (dis)similarity of data objects to detect arbitrary shaped clusters. While it is shown to perform well for many applications, DPC remains: (i) not robust for datasets with clusters having different densities, and (ii) sensitive to the change in the units/scales used to represent data. These drawbacks are mainly due to the use of the data-independent similarity measure based on the Euclidean distance. In this paper, we ad-dress these issues by proposing an effective data-dependent similarity measure based on Probability Mass , which we call MP-Similarity , and by incorporating it in DPC to create MP-DPC, a data-dependent variant of DPC. We evaluate and compare MP-DPC against diverse baselines using several clustering metrics and datasets. Our experiments demonstrate that: (a) MP-DPC produces better clustering results than DPC us-ing the Euclidean distance and existing data-dependent similarity measures; (b) MP-Similarity coupled with Shared-Nearest-Neighbor-based density metric in DPC further enhances the quality of clustering results; and (c) unlike DPC with existing data-independent and data-dependent similarity measures, MP-DPC is robust to the change in the units/scales used to represent data. Our findings suggest that MP -Similarity provides a more viable solution for DPC in datasets with unknown distribution or units/scales of features, which is often the case in many real-world applications.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Overcoming Key Weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilarity Measure
    Ting, Kai Ming
    Zhu, Ye
    Carman, Mark
    Zhu, Yue
    Zhou, Zhi-Hua
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1205 - 1214
  • [2] Data-Dependent Sparsity for Subspace Clustering
    Xin, Bo
    Wang, Yizhou
    Gao, Wen
    Wipf, David
    [J]. CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017), 2017,
  • [3] A novel similarity measure for data clustering
    Yao, Yuhui
    Chen, Yan Qiu
    Chen, Lihui
    [J]. Intelligent Data Analysis, 2000, 4 (05) : 421 - 431
  • [4] Similarity Learning via Optimizing the Data-Dependent Kernel
    Xiong, Huilin
    Shi, Panfei
    [J]. 2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS, 2009, : 512 - 516
  • [5] Generalized Similarity Measure for Categorical Data Clustering
    Sharma, Shruti
    Singh, Manoj
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 765 - 769
  • [6] Hierarchical Clustering Using Homogeneity as Similarity Measure for Big Data Analytics
    Zhao, Yunwei
    Chi, Chi-Hung
    Ding, Chen
    Wong, Raymond
    Zhou, Wei
    Wang, Can
    [J]. 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2015), 2015, : 348 - 354
  • [7] Feature Selection Based on Density Peak Clustering Using Information Distance Measure
    Cai, Jie
    Chao, Shilong
    Yang, Sheng
    Wang, Shulin
    Luo, Jiawei
    [J]. INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2017, PT II, 2017, 10362 : 125 - 131
  • [8] A Similarity Measure for Clustering Gene Expression Data
    Baishya, Ram Charan
    Sarmah, Rosy
    Bhattacharyya, Dhruba Kumar
    Dutta, Malay Ananda
    [J]. APPLIED ALGORITHMS, 2014, 8321 : 245 - 256
  • [9] An Improved Visual Assessment with Data-Dependent Kernel for Stream Clustering
    Zhang, Baojie
    Cao, Yang
    Zhu, Ye
    Rajasegarar, Sutharshan
    Liu, Gang
    Li, Hong Xian
    Angelova, Maia
    Li, Gang
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT I, 2023, 13935 : 197 - 209
  • [10] An Improved Visual Assessment with Data-Dependent Kernel for Stream Clustering
    Zhang, Baojie
    Cao, Yang
    Zhu, Ye
    Rajasegarar, Sutharshan
    Liu, Gang
    Li, Hong Xian
    Angelova, Maia
    Li, Gang
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, 13935 LNCS : 197 - 209