Common Nearest Neighbor Clustering-A Benchmark

被引:14
|
作者
Lemke, Oliver [1 ]
Keller, Bettina G. [1 ]
机构
[1] Free Univ Berlin, Dept Biol, Chem, Pharm, Takustr 3, D-14195 Berlin, Germany
来源
ALGORITHMS | 2018年 / 11卷 / 02期
关键词
density-based clustering; molecular dynamics simulations; Markov state models; core sets; milestoning;
D O I
10.3390/a11020019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cluster analyses are often conducted with the goal to characterize an underlying probability density, for which the data-point density serves as an estimate for this probability density. We here test and benchmark the common nearest neighbor (CNN) cluster algorithm. This algorithm assigns a spherical neighborhood R to each data point and estimates the data-point density between two data points as the number of data points N in the overlapping region of their neighborhoods (step 1). The main principle in the CNN cluster algorithm is cluster growing. This grows the clusters by sequentially adding data points and thereby effectively positions the border of the clusters along an iso-surface of the underlying probability density. This yields a strict partitioning with outliers, for which the cluster represents peaks in the underlying probability density-termed core sets (step 2). The removal of the outliers on the basis of a threshold criterion is optional (step 3). The benchmark datasets address a series of typical challenges, including datasets with a very high dimensional state space and datasets in which the cluster centroids are aligned along an underlying structure (Birch sets). The performance of the CNN algorithm is evaluated with respect to these challenges. The results indicate that the CNN cluster algorithm can be useful in a wide range of settings. Cluster algorithms are particularly important for the analysis of molecular dynamics (MD) simulations. We demonstrate how the CNN cluster results can be used as a discretization of the molecular state space for the construction of a core-set model of the MD improving the accuracy compared to conventional full-partitioning models. The software for the CNN clustering is available on GitHub.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Fuzzy nearest neighbor clustering of high-dimensional data
    Wang, HB
    Yu, YQ
    Zhou, DR
    Meng, B
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2569 - 2572
  • [32] Research and Application of Clustering Algorithm Based on Shared Nearest Neighbor
    Ye, Hanmin
    Bai, Xue
    Lv, Hao
    2017 INTERNATIONAL CONFERENCE ON GREEN INFORMATICS (ICGI), 2017, : 11 - 16
  • [33] High-dimensional shared nearest neighbor clustering algorithm
    Yin, J
    Fan, XL
    Chen, YQ
    Ren, JT
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 494 - 502
  • [34] Visual clustering for high dimensional data based on nearest neighbor
    Yu, Bei
    Wang, Jun
    Ye, Shiren
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2000, 37 (06): : 714 - 720
  • [35] Pairwise Clustering by Minimizing the Error of Unsupervised Nearest Neighbor Classification
    Yang, Yingzhen
    Chu, Xinqi
    Huang, Thomas S.
    2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 2, 2013, : 182 - 187
  • [36] Characterizations of nearest and farthest neighbor algorithms by clustering admissibility conditions
    Chen, ZM
    Van Ness, J
    PATTERN RECOGNITION, 1998, 31 (10) : 1573 - 1578
  • [37] KNNCC: An Algorithm for K-Nearest Neighbor Clique Clustering
    Qu Chao
    Yuan Ruifen
    Wei Xiaorui
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 1763 - 1766
  • [38] Shared Nearest Neighbor Clustering in a Locality Sensitive Hashing Framework
    Kanj, Sawsan
    Bruls, Thomas
    Gazut, Stephane
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2018, 25 (02) : 236 - 250
  • [39] Incremental Shared Nearest Neighbor Density-Based Clustering
    Singh, Sumeet
    Awekar, Amit
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1533 - 1536
  • [40] A New Density Clustering Method Using Mutual Nearest Neighbor
    Zhang, Yufang
    Zha, Yongfang
    Li, Lintao
    Xiong, Zhongyang
    WEB AND BIG DATA, APWEB-WAIM 2021, PT I, 2021, 12858 : 487 - 494