DNA microarray data imputation and significance analysis of differential expression

被引:88
|
作者
Jörnsten, R
Wang, HY
Welsh, WJ
Ouyang, M
机构
[1] Rutgers State Univ, Dept Stat, New Brunswick, NJ 08903 USA
[2] Univ Med & Dent New Jersey, Robert Wood Johnson Med Sch, Dept Pharmacol, Piscataway, NJ 08854 USA
[3] Univ Med & Dent New Jersey, Inst Informat, Piscataway, NJ 08854 USA
关键词
D O I
10.1093/bioinformatics/bti638
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Significance analysis of differential expression in DNA microarray data is an important task. Much of the current research is focused on developing improved tests and software tools. The task is difficult not only owing to the high dimensionality of the data (number of genes), but also because of the often non-negligible presence of missing values. There is thus a great need to reliably impute these missing values prior to the statistical analyses. Many imputation methods have been developed for DNA microarray data, but their impact on statistical analyses has not been well studied. In this work we examine how missing values and their imputation affect significance analysis of differential expression. Results: We develop a new imputation method (LinCmb) that is superior to the widely used methods in terms of normalized root mean squared error. Its estimates are the convex combinations of the estimates of existing methods. We find that LinCmb adapts to the structure of the data: If the data are heterogeneous or if there are few missing values, LinCmb puts more weight on local imputation methods; if the data are homogeneous or if there are many missing values, LinCmb puts more weight on global imputation methods. Thus, LinCmb is a useful tool to understand the merits of different imputation methods. We also demonstrate that missing values affect significance analysis. Two datasets, different amounts of missing values, different imputation methods, the standard t-test and the regularized t-test and ANOVA are employed in the simulations. We conclude that good imputation alleviates the impact of missing values and should be an integral part of microarray data analysis. The most competitive methods are LinCmb, GMC and BPCA. Popular imputation schemes such as SVD, row mean, and KNN all exhibit high variance and poor performance. The regularized t-test is less affected by missing values than the standard t-test.
引用
收藏
页码:4155 / 4161
页数:7
相关论文
共 50 条
  • [21] Analysis of DNA microarray data
    Hackl, H
    Cabo, FS
    Sturn, A
    Wolkenhauer, O
    Trajanoski, Z
    [J]. CURRENT TOPICS IN MEDICINAL CHEMISTRY, 2004, 4 (13) : 1357 - 1370
  • [22] An Efficient Technique for Missing value Imputation in Microarray Gene Expression Data
    Valarmathie, P.
    Dinakaran, K.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND SYSTEMS (ICCCS'14), 2014, : 73 - 80
  • [23] A Review on Missing Value Imputation Algorithms for Microarray Gene Expression Data
    Moorthy, Kohbalan
    Mohamad, Mohd Saberi
    Deris, Safaai
    [J]. CURRENT BIOINFORMATICS, 2014, 9 (01) : 18 - 22
  • [24] Differential co-expression analysis of rheumatoid arthritis with microarray data
    Wang, Kunpeng
    Zhao, Liqiang
    Liu, Xuefeng
    Hao, Zhenyong
    Zhou, Yong
    Yang, Chuandong
    Li, Hongqiang
    [J]. MOLECULAR MEDICINE REPORTS, 2014, 10 (05) : 2421 - 2426
  • [25] Reuse of imputed data in microarray analysis increases imputation efficiency
    Ki-Yeol Kim
    Byoung-Jin Kim
    Gwan-Su Yi
    [J]. BMC Bioinformatics, 5
  • [26] Reuse of imputed data in microarray analysis increases imputation efficiency
    Kim, KY
    Kim, BJ
    Yi, GS
    [J]. BMC BIOINFORMATICS, 2004, 5 (1)
  • [27] Analysis of microarray expression data
    Paul Kellam
    [J]. Genome Biology, 1 (1):
  • [28] Adjustments and measures of differential expression for microarray data
    Tsodikov, A
    Szabo, A
    Jones, D
    [J]. BIOINFORMATICS, 2002, 18 (02) : 251 - 260
  • [29] Differential gene expression analysis of infarcted diabetic heart using DNA microarray
    Miyamoto, Y
    Ohta, N
    Morisaki, T
    Yoshimasa, Y
    [J]. DIABETES, 2003, 52 : A245 - A245