Selection of differentially expressed genes in microarray data analysis

被引:85
|
作者
Chen, J. J.
Wang, S-J
Tsai, C-A
Lin, C-J
机构
[1] US FDA, Natl Ctr Toxicol Res, Div Biometry & Risk Assessment, Jefferson, AR 72079 USA
[2] US FDA, Ctr Drug Evaluat & Res, Off Translat Sci, Off Biostat, Silver Spring, MD USA
[3] Acad Sinica, Inst Stat Sci, Taipei 115, Taiwan
来源
PHARMACOGENOMICS JOURNAL | 2007年 / 7卷 / 03期
关键词
fold-change; gene ranking; P-value; permutation test; volcano plot;
D O I
10.1038/sj.tpj.6500412
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
One common objective in microarray experiments is to identify a subset of genes that express differentially among different experimental conditions, for example, between drug treatment and no drug treatment. Often, the goal is to determine the underlying relationship between poor versus good gene signatures for identifying biological functions or predicting specific therapeutic outcomes. Because of the complexity in studying hundreds or thousands of genes in an experiment, selection of a subset of genes to enhance relationships among the underlying biological structures or to improve prediction accuracy of clinical outcomes has been an important issue in microarray data analysis. Selection of differentially expressed genes is a two-step process. The first step is to select an appropriate test statistic and compute the P-value. The genes are ranked according to their P-values as evidence of differential expression. The second step is to assign a significance level, that is, to determine a cutoff threshold from the P-values in accordance with the study objective. In this paper, we consider four commonly used statistics, t-, S- (SAM), U-(Mann-Whitney) and M-statistics to compute the P-values for gene ranking. We consider the family-wise error and false discovery rate false-positive error-controlled procedures to select a limited number of genes, and a receiver-operating characteristic (ROC) approach to select a larger number of genes for assigning the significance level. The ROC approach is particularly useful in genomic/genetic profiling studies. The well-known colon cancer data containing 22 normal and 40 tumor tissues are used to illustrate different gene ranking and significance level assignment methods for applications to genomic/genetic profiling studies. The P-values computed from the t-, U- and M-statistics are very similar. We discuss the common practice that uses the P-value, false-positive error probability, as the primary criterion, and then uses the fold-change as a surrogate measure of biological significance for gene selection. The P-value and the fold-change can be pictorially shown simultaneously in a volcano plot. We also address several issues on gene selection.
引用
下载
收藏
页码:212 / 220
页数:9
相关论文
共 50 条
  • [1] Selection of differentially expressed genes in microarray data analysis
    J J Chen
    S-J Wang
    C-A Tsai
    C-J Lin
    The Pharmacogenomics Journal, 2007, 7 : 212 - 220
  • [2] MicroArray data simulator for improved selection of differentially expressed genes
    Singhal, S
    Kyvernitis, CG
    Johnson, SW
    Kaiser, LR
    Liebman, MN
    Albelda, SM
    CANCER BIOLOGY & THERAPY, 2003, 2 (04) : 383 - 391
  • [3] Microarray data analysis reveals differentially expressed genes in prolactinoma
    Zhou, W.
    Ma, C.
    Yan, Z.
    NEOPLASMA, 2015, 62 (01) : 53 - 60
  • [4] Analysis of differentially expressed genes based on microarray data of glioma
    Jiang, Chun-Ming
    Wang, Xiao-Hua
    Shu, Jin
    Yang, Wei-Xia
    Fu, Ping
    Zhuang, Li-Li
    Zhou, Guo-Ping
    INTERNATIONAL JOURNAL OF CLINICAL AND EXPERIMENTAL MEDICINE, 2015, 8 (10): : 17321 - 17332
  • [5] Microarray data analysis: a practical approach for selecting differentially expressed genes
    Mutch, David M.
    Berger, Alvin
    Mansourian, Robert
    Rytz, Andreas
    Roberts, Matthew-Alan
    GENOME BIOLOGY, 2001, 2 (12):
  • [6] Microarray data analysis: a practical approach for selecting differentially expressed genes
    David M Mutch
    Alvin Berger
    Robert Mansourian
    Andreas Rytz
    Matthew-Alan Roberts
    Genome Biology, 2 (12)
  • [7] The Global Error Assessment (GEA) model for the selection of differentially expressed genes in microarray data
    Mansourian, R
    Mutch, DM
    Aubert, J
    Fogel, P
    Le Goff, JM
    Moulin, J
    Petrov, A
    Rytz, A
    Voegel, JJ
    Roberts, MA
    BIOINFORMATICS, 2004, 20 (16) : 2726 - 2737
  • [8] The global error assessment (GEA) model for the selection of differentially expressed genes in microarray data
    Roberts, MA
    Mansourian, R
    Mutch, D
    Antille, N
    Aubert, J
    Fogel, P
    Legoff, JM
    Moulin, J
    Petrov, A
    Rytz, A
    Voegel, JJ
    FASEB JOURNAL, 2004, 18 (04): : A103 - A103
  • [9] Ranking analysis of microarray data: A powerful method for identifying differentially expressed genes
    Tan, Yuan-De
    Fornage, Myriam
    Fu, Yun-Xin
    GENOMICS, 2006, 88 (06) : 846 - 854
  • [10] An Integrative Bioinformatics Analysis of Microarray Data for Identifying Differentially Expressed Genes in Preeclampsia
    L. M. Song
    M. Long
    S. J. Song
    J. R. Wang
    G. W. Zhao
    N. Zhao
    Russian Journal of Genetics, 2022, 58 : 866 - 875