The impact of sample imbalance on identifying differentially expressed genes

被引:19
|
作者
Yang, Kun [1 ]
Li, Jianzhong [1 ]
Gao, Hong [1 ]
机构
[1] Harbin Inst Technol, Dept Comp Sci & Engn, Harbin 150001, Peoples R China
关键词
D O I
10.1186/1471-2105-7-S4-S8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Recently several statistical methods have been proposed to identify genes with differential expression between two conditions. However, very few studies consider the problem of sample imbalance and there is no study to investigate the impact of sample imbalance on identifying differential expression genes. In addition, it is not clear which method is more suitable for the unbalanced data. Results: Based on random sampling, two evaluation models are proposed to investigate the impact of sample imbalance on identifying differential expression genes. Using the proposed evaluation models, the performances of six famous methods are compared on the unbalanced data. The experimental results indicate that the sample imbalance has a great influence on selecting differential expression genes. Furthermore, different methods have very different performances on the unbalanced data. Among the six methods, the welch t-test appears to perform best when the size of samples in the large variance group is larger than that in the small one, while the Regularized t-test and SAM outperform others on the unbalanced data in other cases. Conclusion: Two proposed evaluation models are effective and sample imbalance should be taken into account in microarray experiment design and gene expression data analysis. The results and two proposed evaluation models can provide some help in selecting suitable method to process the unbalanced data.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] The impact of sample imbalance on identifying differentially expressed genes
    Kun Yang
    Jianzhong Li
    Hong Gao
    [J]. BMC Bioinformatics, 7
  • [2] Sample size for identifying differentially expressed genes in microarray experiments
    Wang, SJ
    Chen, JJ
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2004, 11 (04) : 714 - 726
  • [3] Identifying differentially expressed genes for ordinal phenotypes
    Kim, Yongkang
    Park, Taesung
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
  • [4] Ranking analysis for identifying differentially expressed genes
    Qi, Yunsong
    Sun, Huaijiang
    Sun, Quansen
    Pan, Lei
    [J]. GENOMICS, 2011, 97 (05) : 326 - 329
  • [5] A new framework for identifying differentially expressed genes
    Li, Jie
    Tang, Xianglong
    Zhao, Wei
    Huang, Jianhua
    [J]. PATTERN RECOGNITION, 2007, 40 (11) : 3249 - 3262
  • [6] Identifying differentially expressed genes in dye-swapped microarray experiments of small sample size
    Lian, I. B.
    Chang, C. J.
    Liang, Y. J.
    Yang, M. J.
    Fann, C. S. J.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (05) : 2602 - 2620
  • [7] Protocol Protocol for identifying differentially expressed genes the RumBall
    Nagai, Luis Augusto Eijy
    Lee, Seohyun
    Nakato, Ryuichiro
    [J]. STAR PROTOCOLS, 2024, 5 (01):
  • [8] Identifying differentially expressed genes in cDNA microarray experiments
    Baggerly, KA
    Coombes, KR
    Hess, KR
    Stivers, DN
    Abruzzo, LV
    Zhang, W
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (06) : 639 - 659
  • [9] SPRING: A METHOD FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN MICROARRAY DATA
    Tian, Yuan
    Liu, Guixia
    Wu, Chunguo
    Rong, Guang
    Sun, An
    [J]. BIOTECHNOLOGY & BIOTECHNOLOGICAL EQUIPMENT, 2013, 27 (05) : 4150 - 4156
  • [10] Nonparametric methods for identifying differentially expressed genes in microarray data
    Troyanskaya, OG
    Garber, ME
    Brown, PO
    Botstein, D
    Altman, RB
    [J]. BIOINFORMATICS, 2002, 18 (11) : 1454 - 1461