Common Nearest Neighbor Clustering-A Benchmark

被引:14
|
作者
Lemke, Oliver [1 ]
Keller, Bettina G. [1 ]
机构
[1] Free Univ Berlin, Dept Biol, Chem, Pharm, Takustr 3, D-14195 Berlin, Germany
来源
ALGORITHMS | 2018年 / 11卷 / 02期
关键词
density-based clustering; molecular dynamics simulations; Markov state models; core sets; milestoning;
D O I
10.3390/a11020019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cluster analyses are often conducted with the goal to characterize an underlying probability density, for which the data-point density serves as an estimate for this probability density. We here test and benchmark the common nearest neighbor (CNN) cluster algorithm. This algorithm assigns a spherical neighborhood R to each data point and estimates the data-point density between two data points as the number of data points N in the overlapping region of their neighborhoods (step 1). The main principle in the CNN cluster algorithm is cluster growing. This grows the clusters by sequentially adding data points and thereby effectively positions the border of the clusters along an iso-surface of the underlying probability density. This yields a strict partitioning with outliers, for which the cluster represents peaks in the underlying probability density-termed core sets (step 2). The removal of the outliers on the basis of a threshold criterion is optional (step 3). The benchmark datasets address a series of typical challenges, including datasets with a very high dimensional state space and datasets in which the cluster centroids are aligned along an underlying structure (Birch sets). The performance of the CNN algorithm is evaluated with respect to these challenges. The results indicate that the CNN cluster algorithm can be useful in a wide range of settings. Cluster algorithms are particularly important for the analysis of molecular dynamics (MD) simulations. We demonstrate how the CNN cluster results can be used as a discretization of the molecular state space for the construction of a core-set model of the MD improving the accuracy compared to conventional full-partitioning models. The software for the CNN clustering is available on GitHub.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] XML clustering based on common neighbor
    Lv, TY
    Zhang, XZ
    Zuo, WL
    Wang, ZX
    ADVANCED WEB AND NETWORK TECHNOLOGIES, AND APPLICATIONS, PROCEEDINGS, 2006, 3842 : 137 - 141
  • [22] Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with Arbitrary Objective Functions
    Bubeck, Sebastien
    von Luxburg, Ulrike
    JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 657 - 698
  • [23] Characterizations of nearest and farthest neighbor algorithms by clustering admissibility conditions
    Florida Int Univ, Miami, United States
    Pattern Recognit, 10 (1573-1578):
  • [24] Efficient Nearest-Neighbor Query and Clustering of Planar Curves
    Aronov, Boris
    Filtser, Omrit
    Horton, Michael
    Katz, Matthew J.
    Sheikhan, Khadijeh
    ALGORITHMS AND DATA STRUCTURES, WADS 2019, 2019, 11646 : 28 - 42
  • [25] Adaptive nearest neighbor classifier based on supervised ellipsoid clustering
    Zhang, Guo-Jun
    Du, Ji-Xiang
    Huang, De-Shuang
    Lok, Tat-Ming
    Lyu, Michael R.
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 582 - 585
  • [26] EFFECTIVE ALGORITHMS FOR THE NEAREST-NEIGHBOR METHOD IN THE CLUSTERING PROBLEM
    HATTORI, K
    TORII, Y
    PATTERN RECOGNITION, 1993, 26 (05) : 741 - 746
  • [27] Spectral Clustering Based on k-Nearest Neighbor Graph
    Lucinska, Malgorzata
    Wierzchon, Lawomir T.
    COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT (CISIM), 2012, 7564 : 254 - 265
  • [28] A New Nearest Neighbor Median Shift Clustering for Binary Data
    Beck, Gael
    Lebbah, Mustapha
    Azzag, Hanene
    Duong, Tarn
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 101 - 112
  • [29] Phase correlation and clustering of a nearest neighbor coupled oscillators system
    El-Nashar, HF
    INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 2003, 13 (11): : 3473 - 3481
  • [30] An improved SLIC superpixels using reciprocal nearest neighbor clustering
    School of mathematics and computer science, Panzhihua University, Panzhihua
    Si chuan, China
    Int. J. Signal Process. Image Process. Pattern Recogn., 5 (239-248):