A ground truth based comparative study on clustering of gene expression data

被引:5
|
作者
Zhu, Yitan [1 ]
Wang, Zuyi [1 ,2 ]
Miller, David J. [3 ]
Clarke, Robert [4 ,5 ]
Xuan, Jianhua [1 ]
Hoffman, Eric P. [2 ]
Wang, Yue [1 ]
机构
[1] Virginia Polytech Inst & State Univ, Dept Elect & Comp Engn, Arlington, VA 22203 USA
[2] Childrens Natl Med Ctr, Res Ctr Genet Med, Washington, DC 20010 USA
[3] Penn State Univ, Dept Elect Engn, University Pk, PA 16802 USA
[4] Georgetown Univ, Dept Oncol & Physiol & Biophys, Washington, DC 20007 USA
[5] Georgetown Univ, Lombardi Comprehens Canc Ctr, Washington, DC 20007 USA
来源
关键词
clustering evaluation; sample clustering; comparative study; gene expression data;
D O I
10.2741/2972
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG (TM) toolkit (VIsual Statistical Data Analyzer VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.
引用
收藏
页码:3839 / 3849
页数:11
相关论文
共 50 条
  • [1] Clustering cancer gene expression data: a comparative study
    de Souto, Marcilio C. P.
    Costa, Ivan G.
    de Araujo, Daniel S. A.
    Ludermir, Teresa B.
    Schliep, Alexander
    BMC BIOINFORMATICS, 2008, 9 (1)
  • [2] Clustering cancer gene expression data: a comparative study
    Marcilio CP de Souto
    Ivan G Costa
    Daniel SA de Araujo
    Teresa B Ludermir
    Alexander Schliep
    BMC Bioinformatics, 9
  • [3] Study on Ensemble based Clustering Algorithm for Gene Expression Data
    Chu, Zhenfang
    Cao, Buyang
    Yu, Fang
    3RD ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI2018), 2018, 1069
  • [4] Recent advances in gene expression data clustering: a case study with comparative results
    Bezerra, George B.
    Cancado, Geraldo M. A.
    Menossi, Marcelo
    de Castro, Leandro N.
    Von Zuben, Fernando J.
    GENETICS AND MOLECULAR RESEARCH, 2005, 4 (03) : 514 - 524
  • [5] A comparative study of clustering methods on gene expression data for lung cancer prognosis
    Zhang, Jason Z.
    Wang, Chi
    BMC RESEARCH NOTES, 2023, 16 (01)
  • [6] A comparative study of clustering methods on gene expression data for lung cancer prognosis
    Jason Z. Zhang
    Chi Wang
    BMC Research Notes, 16
  • [7] Problems in gene clustering based on gene expression data
    Bryan, J
    JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 90 (01) : 44 - 66
  • [8] Projection Based Clustering of Gene Expression Data
    Tasoulis, Sotiris K.
    Plagianakos, Vassilis P.
    Tasoulis, Dimitris K.
    COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS, 2010, 6160 : 228 - +
  • [9] The Clustering Algorithm Study of Gene Expression Data
    He Rui
    Lin Chunmei
    ENVIRONMENTAL BIOTECHNOLOGY AND MATERIALS ENGINEERING, PTS 1-3, 2011, 183-185 : 93 - +
  • [10] Multi-objective Optimization for Clustering Microarray Gene Expression Data - A Comparative Study
    Fuad, Muhammad Marwan Muhammad
    AGENT AND MULTI-AGENT SYSTEMS: TECHNOLOGIES AND APPLICATIONS, 2015, 38 : 123 - 133