Permutation-validated principal components analysis of microarray data

被引:0
|
作者
Landgrebe, Jobst [2 ]
Wurst, Wolfgang [2 ,3 ]
Welzl, Gerhard [1 ]
机构
[1] GSF Natl Res Ctr Environm & Hlth, Inst Biomath & Biometry, D-85764 Neuherberg, Germany
[2] Max Planck Inst Psychiat, D-80804 Munich, Germany
[3] GSF Natl Res Ctr Environm & Hlth, Inst Mammalian Genet, D-85764 Neuherberg, Germany
来源
GENOME BIOLOGY | 2002年 / 3卷 / 04期
关键词
D O I
暂无
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the assessment of the reliability of gene-selection procedures. Here we describe a method to assess reliability in multivariate microarray data analysis using permutation-validated principal components analysis (PCA). The approach is designed for microarray data with a group structure. Results: We used PCA to detect the major sources of variance underlying the hybridization conditions followed by gene selection based on PCA-derived and permutation-based test statistics. We validated our method by applying it to well characterized yeast cell-cycle data and to two datasets from our laboratory. We could describe the major sources of variance, select informative genes and visualize the relationship of genes and arrays. We observed differences in the level of the explained variance and the interpretability of the selected genes. Conclusions: Combining data visualization and permutation-based gene selection, permutation-validated PCA enables one to illustrate gene-expression variance between several conditions and to select genes by taking into account the relationship of between-group to within-group variance of genes. The method can be used to extract the leading sources of variance from microarray data, to visualize relationships between genes and hybridizations and to select informative genes in a statistically reliable manner. This selection accounts for the level of reproducibility of replicates or group structure as well as gene-specific scatter. Visualization of the data can support a straightforward biological interpretation.
引用
收藏
页数:11
相关论文
共 50 条
  • [11] Integrating Data Transformation in Principal Components Analysis
    Maadooliat, Mehdi
    Huang, Jianhua Z.
    Hu, Jianhua
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2015, 24 (01) : 84 - 103
  • [12] A study of principal components analysis for mixed data
    Kalantan, Zakiah I.
    Alqahtani, Nada A.
    INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2019, 6 (12): : 99 - 104
  • [13] Functional principal components analysis with survey data
    Cardot, Herve
    Chaouch, Mohamed
    Goga, Camelia
    Labruere, Catherine
    FUNCTIONAL AND OPERATORIAL STATISTICS, 2008, : 95 - 102
  • [14] Gene selection for microarray data analysis using principal component analysis
    Wang, AT
    Gehan, EA
    STATISTICS IN MEDICINE, 2005, 24 (13) : 2069 - 2087
  • [15] K-means Clustering and Principal Components Analysis of Microarray Data of L1000 Landmark Genes
    Clayman, Carly L.
    Srinivasan, Satish M.
    Sangwan, Raghvinder S.
    COMPLEX ADAPTIVE SYSTEMS, 2020, 168 : 97 - 104
  • [16] Principal components analysis based methodology to identify differentially expressed genes in time-course microarray data
    Jonnalagadda, Sudhakar
    Srinivasan, Rajagopalan
    BMC BIOINFORMATICS, 2008, 9 (1)
  • [17] Principal components analysis based methodology to identify differentially expressed genes in time-course microarray data
    Sudhakar Jonnalagadda
    Rajagopalan Srinivasan
    BMC Bioinformatics, 9
  • [18] A COMPUTER PROGRAM FOR PRINCIPAL COMPONENTS ANALYSIS OF LEARNING DATA
    WAINER, H
    BEHAVIORAL SCIENCE, 1970, 15 (02): : 206 - &
  • [19] Principal components analysis of nonstationary time series data
    Lansangan, Joseph Ryan G.
    Barrios, Erniel B.
    STATISTICS AND COMPUTING, 2009, 19 (02) : 173 - 187
  • [20] SPARSE LOGISTIC PRINCIPAL COMPONENTS ANALYSIS FOR BINARY DATA
    Lee, Seokho
    Huang, Jianhua Z.
    Hu, Jianhua
    ANNALS OF APPLIED STATISTICS, 2010, 4 (03): : 1579 - 1601