Robust clustering by identifying the veins of clusters based on kernel density estimation

被引:28
|
作者
Zhou, Zhou [1 ]
Si, Gangquan [1 ]
Zhang, Yanbin [1 ]
Zheng, Kai [1 ]
机构
[1] Xi An Jiao Tong Univ, State Key Lab Elect Insulat & Power Equipment, Xian 710049, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Robust clustering; Veins of clusters; Density peaks; Kernel density estimation; K-MEANS ALGORITHM; CROSS-VALIDATION; BANDWIDTH SELECTION; COMMUNITY DETECTION; FAST SEARCH; MODEL; PEAKS; FIND;
D O I
10.1016/j.knosys.2018.06.021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering by fast search and find of density peaks(DPC) was an efficient clustering algorithm proposed by Rodriguez and Laio [49]. It adopts a concise but effective categorizing strategy which assigns data points to the same cluster as their nearest neighbors with higher densities. However, it suffers from the so-called "chain reaction" due to the simplistic strategy. What's more, the accuracy of DPC badly depends on the selection of cut off distance d(c) when the data scale ranges. In order to take advantage of DPC whilst avoiding the drawbacks aforementioned, this paper proposed a robust clustering algorithm named IVDPC which provides a feasible approach for solving the classification problem of data with different shape and distribution. The local density is estimated through a non-parametric density estimation method first. Then, by calculating the similarity matrix of points and connecting the most resembled pairs continuously from high density regions to the edge of clusters, IVDPC identifies the main structure(veins) of clusters and classifies the rest of the samples precisely to the nearest vein. Having veins rather than one representative point to represent a cluster allows IVDPC to adjust well to the geometry of non-spherical shapes and decrease the chain reaction of DPC. The method proposed is benchmarked on artificial and real-world data sets against several baseline methods. The experimental results demonstrate that IVDPC can recognize the structure distribution of clusters and perform better in clustering accuracy over several state-of-art algorithms.
引用
收藏
页码:309 / 320
页数:12
相关论文
共 50 条
  • [1] Stream Clustering Based on Kernel Density Estimation
    Lodi, Stefano
    Moro, Gianluca
    Sartori, Claudio
    [J]. ECAI 2006, PROCEEDINGS, 2006, 141 : 799 - +
  • [2] Density-based Kernel Scale Estimation for Kernel Clustering
    Sellah, Sofiane
    Nasraoui, Olfa
    [J]. 2013 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS (IISA 2013), 2013, : 248 - 251
  • [3] A robust clustering algorithm based on the identification of core points and KNN kernel density estimation
    Zhou, Zhou
    Si, Gangquan
    Sun, Haodong
    Qu, Kai
    Hou, Weicheng
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 195
  • [4] A new algorithm for clustering based on kernel density estimation
    Matioli, L. C.
    Santos, S. R.
    Kleina, M.
    Leite, E. A.
    [J]. JOURNAL OF APPLIED STATISTICS, 2018, 45 (02) : 347 - 366
  • [5] Robust Kernel Density Estimation
    Kim, JooSeuk
    Scott, Clayton D.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2012, 13 : 2529 - 2565
  • [6] Robust kernel density estimation
    Kim, JooSeuk
    Scott, Clayton
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3381 - 3384
  • [7] Validation tests of an improved kernel density estimation method for identifying disease clusters
    Cai, Qiang
    Rushton, Gerard
    Bhaduri, Budhendra
    [J]. JOURNAL OF GEOGRAPHICAL SYSTEMS, 2012, 14 (03) : 243 - 264
  • [8] Validation tests of an improved kernel density estimation method for identifying disease clusters
    Qiang Cai
    Gerard Rushton
    Budhendra Bhaduri
    [J]. Journal of Geographical Systems, 2012, 14 : 243 - 264
  • [9] MULTIDIMENSIONAL DATA CLUSTERING BASED ON FAST KERNEL DENSITY ESTIMATION
    Yin, Xun-Fu
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 311 - 315
  • [10] DENCLUE 2.0: Fast clustering based on kernel density estimation
    Hinneburg, Alexander
    Gabriel, Hans-Henning
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS VII, PROCEEDINGS, 2007, 4723 : 70 - +