Common Nearest Neighbor Clustering-A Benchmark

被引：14

作者：

Lemke, Oliver ^{[1
]}

Keller, Bettina G. ^{[1
]}

机构：

[1] Free Univ Berlin, Dept Biol, Chem, Pharm, Takustr 3, D-14195 Berlin, Germany

来源：

ALGORITHMS | 2018年 / 11卷 / 02期

关键词：

density-based clustering; molecular dynamics simulations; Markov state models; core sets; milestoning;

D O I：

10.3390/a11020019

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cluster analyses are often conducted with the goal to characterize an underlying probability density, for which the data-point density serves as an estimate for this probability density. We here test and benchmark the common nearest neighbor (CNN) cluster algorithm. This algorithm assigns a spherical neighborhood R to each data point and estimates the data-point density between two data points as the number of data points N in the overlapping region of their neighborhoods (step 1). The main principle in the CNN cluster algorithm is cluster growing. This grows the clusters by sequentially adding data points and thereby effectively positions the border of the clusters along an iso-surface of the underlying probability density. This yields a strict partitioning with outliers, for which the cluster represents peaks in the underlying probability density-termed core sets (step 2). The removal of the outliers on the basis of a threshold criterion is optional (step 3). The benchmark datasets address a series of typical challenges, including datasets with a very high dimensional state space and datasets in which the cluster centroids are aligned along an underlying structure (Birch sets). The performance of the CNN algorithm is evaluated with respect to these challenges. The results indicate that the CNN cluster algorithm can be useful in a wide range of settings. Cluster algorithms are particularly important for the analysis of molecular dynamics (MD) simulations. We demonstrate how the CNN cluster results can be used as a discretization of the molecular state space for the construction of a core-set model of the MD improving the accuracy compared to conventional full-partitioning models. The software for the CNN clustering is available on GitHub.

引用

页数：21

共 50 条

[1] CommonNNClustering?A Python']Python Package for Generic Common-Nearest-Neighbor Clustering
Kapp-Joswig, Jan-Oliver
Keller, Bettina G.
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (04) : 1093 - 1098
[2] Fuzzy Shared Nearest Neighbor Clustering
Rika Sharma
Kesari Verma
International Journal of Fuzzy Systems, 2019, 21 : 2667 - 2678
[3] A KTH NEAREST NEIGHBOR CLUSTERING PROCEDURE
WONG, MA
LANE, T
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1983, 45 (03): : 362 - 368
[4] Fuzzy Shared Nearest Neighbor Clustering
Sharma, Rika
Verma, Kesari
INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2019, 21 (08) : 2667 - 2678
[5] Nearest Neighbor Matching for Deep Clustering
Dang, Zhiyuan
Deng, Cheng
Yang, Xu
Wei, Kun
Huang, Heng
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13688 - 13697
[6] Affine Subspace Clustering with Nearest Subspace Neighbor
Hotta, Katsuya
Xie, Haoran
Zhang, Chao
INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY (IWAIT) 2021, 2021, 11766
[7] Nonparametric Nearest Neighbor Random Process Clustering
Tschannen, Michael
Bolcskei, Helmut
2015 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2015, : 1207 - 1211
[8] NEAREST NEIGHBOR CLUSTERING OVER PARTITIONED DATA
Khedr, Ahmed M.
COMPUTING AND INFORMATICS, 2011, 30 (05) : 1011 - 1036
[9] Clustering-based Nearest Neighbor Searching
Ling, Ping
Rong, Xiangsheng
Dong, Yongquan
JOURNAL OF COMPUTERS, 2013, 8 (08) : 2085 - 2092
[10] A clustering algorithm based on natural nearest neighbor
Zhu, Qingsheng
Huang, Jinlong
Feng, Ji
Zhou, Xianlin
Journal of Computational Information Systems, 2014, 10 (13): : 5473 - 5480

← 1 2 3 4 5 →