A ground truth based comparative study on clustering of gene expression data

被引:5
|
作者
Zhu, Yitan [1 ]
Wang, Zuyi [1 ,2 ]
Miller, David J. [3 ]
Clarke, Robert [4 ,5 ]
Xuan, Jianhua [1 ]
Hoffman, Eric P. [2 ]
Wang, Yue [1 ]
机构
[1] Virginia Polytech Inst & State Univ, Dept Elect & Comp Engn, Arlington, VA 22203 USA
[2] Childrens Natl Med Ctr, Res Ctr Genet Med, Washington, DC 20010 USA
[3] Penn State Univ, Dept Elect Engn, University Pk, PA 16802 USA
[4] Georgetown Univ, Dept Oncol & Physiol & Biophys, Washington, DC 20007 USA
[5] Georgetown Univ, Lombardi Comprehens Canc Ctr, Washington, DC 20007 USA
来源
关键词
clustering evaluation; sample clustering; comparative study; gene expression data;
D O I
10.2741/2972
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG (TM) toolkit (VIsual Statistical Data Analyzer VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.
引用
收藏
页码:3839 / 3849
页数:11
相关论文
共 50 条
  • [21] Comparative analysis of clustering methods for gene expression time course data
    Costa, IG
    de Carvalho, FDT
    de Souto, MCP
    GENETICS AND MOLECULAR BIOLOGY, 2004, 27 (04) : 623 - 631
  • [22] Model-based clustering and data transformations for gene expression data
    Yeung, KY
    Fraley, C
    Murua, A
    Raftery, AE
    Ruzzo, WL
    BIOINFORMATICS, 2001, 17 (10) : 977 - 987
  • [23] Gene expression data clustering and visualization based on a binary hierarchical clustering framework
    Szeto, LK
    Liew, AWC
    Yan, H
    Tang, SS
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2003, 14 (04): : 341 - 362
  • [24] Gene expression data clustering using a multiobjective symmetry based clustering technique
    Saha, Sriparna
    Ekbal, Asif
    Gupta, Kshitija
    Bandyopadhyay, Sanghamitra
    COMPUTERS IN BIOLOGY AND MEDICINE, 2013, 43 (11) : 1965 - 1977
  • [25] A Clustering Algorithm for Gene Expression Data Based on Graph Theory
    Du, Xiaoming
    Zhao, Zheng
    Jiang, Zhongbo
    2009 3RD INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1-11, 2009, : 899 - 902
  • [26] A Resampling Based Clustering Algorithm for Replicated Gene Expression Data
    Li, Han
    Li, Chun
    Hu, Jie
    Fan, Xiaodan
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (06) : 1295 - 1303
  • [27] Gene Selection for Cancer Clustering Analysis Based on Expression Data
    Xu, Taosheng
    Su, Ning
    Wang, Rujing
    Song, Liangtu
    PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 516 - 519
  • [28] Ensemble biclustering gene expression data based on the spectral clustering
    Yin, Lu
    Liu, Yongguo
    NEURAL COMPUTING & APPLICATIONS, 2018, 30 (08): : 2403 - 2416
  • [29] Ensemble biclustering gene expression data based on the spectral clustering
    Lu Yin
    Yongguo Liu
    Neural Computing and Applications, 2018, 30 : 2403 - 2416
  • [30] Gene expression data clustering based on local similarity combination
    Pan, D
    Wang, F
    PROCEEDINGS OF THE 4TH ASIA-PACIFIC BIOINFORMATICS CONFERENCE, 2006, 3 : 353 - 362