The influence of missing value imputation on detection of differentially expressed genes from microarray data

被引:48
|
作者
Scheel, I
Aldrin, M
Glad, IK
Sorum, R
Lyng, H
Frigessi, A
机构
[1] Univ Oslo, Dept Math, NO-0316 Oslo, Norway
[2] Norwegian Comp Ctr, Dept Stat Anal Image Anal & Pattern Recognit, NO-0314 Oslo, Norway
[3] Norwegian Radium Hosp, Dept Radiat Biol, NO-0310 Oslo, Norway
[4] Univ Oslo, Dept Biostat, NO-0317 Oslo, Norway
关键词
D O I
10.1093/bioinformatics/bti708
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Missing values are problematic for the analysis of microarray data. Imputation methods have been compared in terms of the similarity between imputed and true values in simulation experiments and not of their influence on the final analysis. The focus has been on missing at random, while entries are missing also not at random. Results: We investigate the influence of imputation on the detection of differentially expressed genes from cDNA microarray data. We apply ANOVA for microarrays and SAM and look to the differentially expressed genes that are lost because of imputation. We show that this new measure provides useful information that the traditional root mean squared error cannot capture. We also show that the type of missingness matters: imputing 5% missing not at random has the same effect as imputing 10-30% missing at random. We propose a new method for imputation (LinImp), fitting a simple linear model for each channel separately, and compare it with the widely used KNNimpute method. For 10% missing at random, KNNimpute leads to twice as many lost differentially expressed genes as LinImp.
引用
收藏
页码:4272 / 4279
页数:8
相关论文
共 50 条
  • [1] Microarray data quality control improves the detection of differentially expressed genes
    Kauffmann, Audrey
    Huber, Wolfgang
    [J]. GENOMICS, 2010, 95 (03) : 138 - 142
  • [2] Effect of normalisation on detection of differentially expressed genes in cDNA microarray data analysis
    Dimauro, C.
    Bacciu, N.
    Macciotta, N. P. P.
    [J]. ITALIAN JOURNAL OF ANIMAL SCIENCE, 2007, 6 : 122 - 124
  • [3] CIT: identification of differentially expressed clusters of genes from microarray data
    Rhodes, DR
    Miller, JC
    Haab, BB
    Furge, KA
    [J]. BIOINFORMATICS, 2002, 18 (01) : 205 - 206
  • [4] Selection of differentially expressed genes in microarray data analysis
    J J Chen
    S-J Wang
    C-A Tsai
    C-J Lin
    [J]. The Pharmacogenomics Journal, 2007, 7 : 212 - 220
  • [5] Selection of differentially expressed genes in microarray data analysis
    Chen, J. J.
    Wang, S-J
    Tsai, C-A
    Lin, C-J
    [J]. PHARMACOGENOMICS JOURNAL, 2007, 7 (03): : 212 - 220
  • [6] Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data
    Sehgal, MSB
    Gondal, I
    Dooley, LS
    [J]. BIOINFORMATICS, 2005, 21 (10) : 2417 - 2423
  • [7] Effect of raw data normalisation on detection of differentially expressed genes in cDNA microarray experiments
    Dimauro, C.
    Macciotta, N. P. P.
    Cappio-Borlino, A.
    [J]. JOURNAL OF DAIRY SCIENCE, 2007, 90 : 374 - 374
  • [8] Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis
    Hoffmann, Reinhard
    Seidl, Thomas
    Dugas, Martin
    [J]. GENOME BIOLOGY, 2002, 3 (07):
  • [9] Effect of raw data normalisation on detection of differentially expressed genes in cDNA microarray experiments
    Dimauro, C.
    Macciotta, N. P. P.
    Cappio-Borlino, A.
    [J]. JOURNAL OF ANIMAL SCIENCE, 2007, 85 : 374 - 374
  • [10] Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis
    Reinhard Hoffmann
    Thomas Seidl
    Martin Dugas
    [J]. Genome Biology, 3 (7):