Comparing the performance of biomedical clustering methods

被引:0
|
作者
Wiwie, Christian [1 ]
Baumbach, Jan [1 ,2 ,3 ]
Rottger, Richard [1 ]
机构
[1] Univ Southern Denmark, Dept Math & Comp Sci, Odense, Denmark
[2] Max Planck Inst Informat, Computat Syst Biol, D-66123 Saarbrucken, Germany
[3] Univ Saarland, Cluster Excellence Multimodal Comp & Interact, D-66123 Saarbrucken, Germany
关键词
PROTEIN-INTERACTION NETWORKS; GENE-EXPRESSION DATA; MICROARRAY DATA; AUTOMATED-METHOD; ALGORITHMS; COMPLEXES; DISCOVERY; DATABASE; MODEL;
D O I
10.1038/NMETH.3583
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.
引用
收藏
页码:1033 / 1038
页数:6
相关论文
共 50 条
  • [21] Performance determinants of unsupervised clustering methods for microbiome data
    Yushu Shi
    Liangliang Zhang
    Christine B. Peterson
    Kim-Anh Do
    Robert R. Jenq
    Microbiome, 10
  • [22] Performance Comparison of Clustering Methods for Gene Family Data
    Wei, Dan
    Jiang, Qingshan
    FRONTIERS IN COMPUTER EDUCATION, 2012, 133 : 827 - +
  • [23] Exploring performance of clustering methods on document sentiment analysis
    Ma, Baojun
    Yuan, Hua
    Wu, Ye
    JOURNAL OF INFORMATION SCIENCE, 2017, 43 (01) : 54 - 74
  • [24] Performance determinants of unsupervised clustering methods for microbiome data
    Shi, Yushu
    Zhang, Liangliang
    Peterson, Christine B.
    Do, Kim-Anh
    Jenq, Robert R.
    MICROBIOME, 2022, 10 (01)
  • [25] Performance evaluation of density-based clustering methods
    Aliguliyev, Ramiz M.
    INFORMATION SCIENCES, 2009, 179 (20) : 3583 - 3602
  • [27] Performance Efficiency and Effectiveness of Clustering Methods for Microarray Datasets\
    Chormunge, Smita
    Jena, Sudarson
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS, ICACNI 2015, VOL 2, 2016, 44 : 557 - 567
  • [28] On the performance of two clustering methods for spatial functional data
    Elvira Romano
    Jorge Mateu
    Ramon Giraldo
    AStA Advances in Statistical Analysis, 2015, 99 : 467 - 492
  • [29] Comparing Various Methods to Compute the NVH Performance of a PMSM
    Leconte, Vincent
    Rodriguez, Alejandro
    Huang, Limin
    Lombard, Patrick
    Um, Doojong
    2021 24TH INTERNATIONAL CONFERENCE ON ELECTRICAL MACHINES AND SYSTEMS (ICEMS 2021), 2021, : 1578 - 1581
  • [30] On the performance of two clustering methods for spatial functional data
    Romano, Elvira
    Mateu, Jorge
    Giraldo, Ramon
    ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2015, 99 (04) : 467 - 492