A ground truth based comparative study on clustering of gene expression data

被引:5
|
作者
Zhu, Yitan [1 ]
Wang, Zuyi [1 ,2 ]
Miller, David J. [3 ]
Clarke, Robert [4 ,5 ]
Xuan, Jianhua [1 ]
Hoffman, Eric P. [2 ]
Wang, Yue [1 ]
机构
[1] Virginia Polytech Inst & State Univ, Dept Elect & Comp Engn, Arlington, VA 22203 USA
[2] Childrens Natl Med Ctr, Res Ctr Genet Med, Washington, DC 20010 USA
[3] Penn State Univ, Dept Elect Engn, University Pk, PA 16802 USA
[4] Georgetown Univ, Dept Oncol & Physiol & Biophys, Washington, DC 20007 USA
[5] Georgetown Univ, Lombardi Comprehens Canc Ctr, Washington, DC 20007 USA
来源
关键词
clustering evaluation; sample clustering; comparative study; gene expression data;
D O I
10.2741/2972
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG (TM) toolkit (VIsual Statistical Data Analyzer VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.
引用
收藏
页码:3839 / 3849
页数:11
相关论文
共 50 条
  • [31] PSO Based Feature Selection for Clustering Gene Expression Data
    Deepthi, P. S.
    Thampi, Sabu M.
    2015 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS (SPICES), 2015,
  • [32] Markov chain correlation based clustering of gene expression data
    Deng, YP
    Chokalingam, V
    Zhang, CY
    ITCC 2005: International Conference on Information Technology: Coding and Computing, Vol 2, 2005, : 750 - 755
  • [33] Clustering gene expression data for periodic genes based on INMF
    Rao, Nini
    Shepherd, Simon J.
    COMPUTATIONAL INTELLIGENCE AND BIOINFORMATICS, PT 3, PROCEEDINGS, 2006, 4115 : 412 - 423
  • [34] Ensemble classification for gene expression data based on parallel clustering
    Meng, Jun
    Jiang, Dingling
    Zhang, Jing
    Luan, Yushi
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2018, 20 (03) : 213 - 229
  • [35] A kernel-based clustering method for gene selection with gene expression data
    Chen, Huihui
    Zhang, Yusen
    Gutman, Ivan
    JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 62 : 12 - 20
  • [36] Hierarchical clustering of gene expression data
    Luo, F
    Tang, K
    Khan, L
    THIRD IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING - BIBE 2003, PROCEEDINGS, 2003, : 328 - 335
  • [37] Fuzzy clustering of gene expression data
    Futschik, ME
    Kasabov, NK
    PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOL 1 & 2, 2002, : 414 - 419
  • [38] An Incremental Clustering of Gene Expression data
    Das, Rosy
    Bhattacharyya, Dhruba K.
    Kalita, Jugal K.
    2009 WORLD CONGRESS ON NATURE & BIOLOGICALLY INSPIRED COMPUTING (NABIC 2009), 2009, : 741 - +
  • [39] Techniques for clustering gene expression data
    Kerr, G.
    Ruskin, H. J.
    Crane, M.
    Doolan, P.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2008, 38 (03) : 283 - 293
  • [40] Clustering analysis for gene expression data
    Chen, YD
    Ermolaeva, O
    Bittner, M
    Meltzer, P
    Trent, J
    Dougherty, ER
    Batman, S
    ADVANCES IN FLUORESCENCE SENSING TECHNOLOGY IV, PROCEEDINGS OF, 1999, 3602 : 422 - 428