Density Peaks Clustering Algorithm Based on Representative Points and K-nearest Neighbors

被引:0
|
作者
Zhang Q.-H. [1 ,2 ]
Zhou J.-P. [1 ,2 ]
Dai Y.-Y. [1 ,2 ]
Wang G.-Y. [1 ,2 ]
机构
[1] Key Laboratory of Tourism Multisource Data Perception and Decision, Ministry of Culture and Tourism, Chongqing University of Posts and Telecommunications), Chongqing
[2] Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications), Chongqing
来源
Ruan Jian Xue Bao/Journal of Software | 2023年 / 34卷 / 12期
关键词
cluster analysis; density peaks clustering (DPC); K-nearest neighbors (KNN); representative point;
D O I
10.13328/j.cnki.jos.006756
中图分类号
学科分类号
摘要
Density peaks clustering (DPC) is a density-based clustering algorithm that can intuitively determine the number of clusters, identify clusters of any shape, and automatically detect and exclude abnormal points. However, DPC still has some shortcomings: The DPC algorithm only considers the global distribution, and the clustering performance is poor for datasets with large cluster density differences. In addition, the point allocation strategy of DPC is likely to cause a Domino effect. Hence, this study proposes a DPC algorithm based on representative points and K-nearest neighbors (KNN), namely, RKNN-DPC. First, the KNN density is constructed, and the representative points are introduced to describe the global distribution of samples and propose a new local density. Then, the KNN information of samples is used to propose a weighted KNN allocation strategy to relieve the Domino effect. Finally, a comparative experiment is conducted with five clustering algorithms on artificial datasets and real datasets. The experimental results show that the RKNN-DPC algorithm can more accurately identify cluster centers and obtain better clustering results. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:5629 / 5648
页数:19
相关论文
共 40 条
  • [1] Zhang YP, Zhou J, Deng ZH, Chung FL, Jiang YZ, Hang WL, Wang ST., Multi-view fuzzy clustering approach based on medoid invariant constraint, Ruan Jian Xue Bao/Journal of Software, 30, 2, pp. 282-301, (2019)
  • [2] Wang CD, Lai JH, Huang D, Zheng WS., SVStream: A support vector-based algorithm for clustering data streams, IEEE Trans. on Knowledge and Data Engineering, 25, 6, pp. 1410-1424, (2013)
  • [3] Xu DK, Tian YJ., A comprehensive survey of clustering algorithms, Annals of Data Science, 2, 2, pp. 165-193, (2015)
  • [4] Dai YY, Zhang QH, Zhi XC., Density peaks clustering algorithm by combining relative density with nearest neighbor relationship, Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 33, 5, pp. 791-805, (2021)
  • [5] MacQueen JB., Some methods for classification and analysis of multivariate observations, Proc. of the 5th Berkeley Symp. on Mathematical Statistics and Probability, pp. 281-297, (1967)
  • [6] Park HS, Jun CH., A simple and fast algorithm for K-medoids clustering, Expert Systems with Applications, 36, 2, pp. 3336-3341, (2009)
  • [7] Guha S, Rastogi R, Shim K., CURE: An efficient clustering algorithm for large databases, Proc. of the 1998 ACM SIGMOD Int'l Conf. on Management of Data, pp. 73-84, (1998)
  • [8] Zhang T, Ramakrishnan R, Livny M., BIRCH: An efficient data clustering method for very large databases, ACM SIGMOD Record, 25, 2, pp. 103-114, (1996)
  • [9] Ester M, Kriegel HP, Sander J, Xu XW., A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. of the 2nd Int'l Conf. on Knowledge Discovery and Data Mining, pp. 226-231, (1996)
  • [10] Ankerst M, Breunig MM, Kriegel HP, Sander J., OPTICS: Ordering points to identify the clustering structure, ACM SIGMOD Record, 28, 2, pp. 49-60, (1999)