Missing value imputation for microarray data: a comprehensive comparison study and a web tool

被引:26
|
作者
Chiu, Chia-Chun [1 ]
Chan, Shih-Yao [1 ]
Wang, Chung-Ching [1 ]
Wu, Wei-Sheng [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Elect Engn, Tainan 701, Taiwan
来源
BMC SYSTEMS BIOLOGY | 2013年 / 7卷
关键词
CELL-CYCLE TRANSCRIPTION; GENE-EXPRESSION DATA; SACCHAROMYCES-CEREVISIAE; REGULATORY MODULES; IDENTIFICATION; PROFILES; LYMPHOMA; IMPACT;
D O I
10.1186/1752-0509-7-S6-S12
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. Results: In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. Conclusions: In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] WIMP: Web server tool for missing data imputation
    Urda, D.
    Subirats, J. L.
    Garcia-Laencina, P. J.
    Franco, L.
    Sancho-Gomez, J. L.
    Jerez, J. M.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2012, 108 (03) : 1247 - 1254
  • [2] Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data
    Sehgal, MSB
    Gondal, I
    Dooley, LS
    BIOINFORMATICS, 2005, 21 (10) : 2417 - 2423
  • [3] An Efficient Technique for Missing value Imputation in Microarray Gene Expression Data
    Valarmathie, P.
    Dinakaran, K.
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND SYSTEMS (ICCCS'14), 2014, : 73 - 80
  • [4] A Review on Missing Value Imputation Algorithms for Microarray Gene Expression Data
    Moorthy, Kohbalan
    Mohamad, Mohd Saberi
    Deris, Safaai
    CURRENT BIOINFORMATICS, 2014, 9 (01) : 18 - 22
  • [5] Triple Imputation for Microarray Missing Value Estimation
    He, Chong
    Li, Hui-Hui
    Zhao, Changbo
    Li, Guo-Zheng
    Zhang, Wei
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 208 - 213
  • [6] Sequential local least squares imputation estimating missing value of microarray data
    Zhang, Xiaobai
    Song, Xiaofeng
    Wang, Huinan
    Zhan, Huanping
    COMPUTERS IN BIOLOGY AND MEDICINE, 2008, 38 (10) : 1112 - 1120
  • [7] Cluster-based KNN Missing Value Imputation for DNA Microarray Data
    Keerin, Phimmarin
    Kurutach, Werasak
    Boongoen, Tossapon
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 445 - 450
  • [8] An efficient ensemble method for missing value imputation in microarray gene expression data
    Xinshan Zhu
    Jiayu Wang
    Biao Sun
    Chao Ren
    Ting Yang
    Jie Ding
    BMC Bioinformatics, 22
  • [9] Missing value imputation improves clustering and interpretation of gene expression microarray data
    Tuikkala, Johannes
    Elo, Laura L.
    Nevalainen, Olli S.
    Aittokallio, Tero
    BMC BIOINFORMATICS, 2008, 9 (1)
  • [10] An efficient ensemble method for missing value imputation in microarray gene expression data
    Zhu, Xinshan
    Wang, Jiayu
    Sun, Biao
    Ren, Chao
    Yang, Ting
    Ding, Jie
    BMC BIOINFORMATICS, 2021, 22 (01)