Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data

被引:76
|
作者
Sehgal, MSB [1 ]
Gondal, I [1 ]
Dooley, LS [1 ]
机构
[1] Monash Univ, Gippsland Sch Comp & Informat Technol, Clayton, Vic 3842, Australia
关键词
D O I
10.1093/bioinformatics/bti345
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. Results: The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE algorithm.
引用
收藏
页码:2417 / 2423
页数:7
相关论文
共 50 条
  • [11] A robust missing value imputation method for noisy data
    Bing Zhu
    Changzheng He
    Panos Liatsis
    Applied Intelligence, 2012, 36 : 61 - 74
  • [12] A robust missing value imputation method for noisy data
    Zhu, Bing
    He, Changzheng
    Liatsis, Panos
    APPLIED INTELLIGENCE, 2012, 36 (01) : 61 - 74
  • [13] A collateral missing value estimation algorithm for DNA microarrays
    Sehgal, MSB
    Gondal, I
    Dooley, L
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 377 - 380
  • [14] An Optimization Algorithm for Missing Value Imputation in Microarray Based on Integrated Information
    Liu, Feng
    Zhang, Yiding
    Liu, Zeming
    Gao, Meng
    FUZZY SYSTEMS AND DATA MINING V (FSDM 2019), 2019, 320 : 55 - 64
  • [15] An Efficient Technique for Missing value Imputation in Microarray Gene Expression Data
    Valarmathie, P.
    Dinakaran, K.
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND SYSTEMS (ICCCS'14), 2014, : 73 - 80
  • [16] A Review on Missing Value Imputation Algorithms for Microarray Gene Expression Data
    Moorthy, Kohbalan
    Mohamad, Mohd Saberi
    Deris, Safaai
    CURRENT BIOINFORMATICS, 2014, 9 (01) : 18 - 22
  • [17] A New Method to Missing Value Imputation for Immunosignature Data
    Koshechkin, A. A.
    Andryushchenko, V. S.
    Zamyatin, A., V
    SOVREMENNYE TEHNOLOGII V MEDICINE, 2019, 11 (02) : 19 - 23
  • [18] Missing value estimation for DNA microarray gene expression data: local least squares imputation
    Kim, H
    Golub, GH
    Park, H
    BIOINFORMATICS, 2005, 21 (02) : 187 - 198
  • [19] Incorporating Nonlinear Relationships in Microarray Missing Value Imputation
    Yu, Tianwei
    Peng, Hesen
    Sun, Wei
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (03) : 723 - 731
  • [20] Imputation Algorithm Based on Copula for Missing Value in Timeseries Data
    Afrianti, Y. S.
    Indratno, S. W.
    Pasaribu, U. S.
    2014 2ND INTERNATIONAL CONFERENCE ON TECHNOLOGY, INFORMATICS, MANAGEMENT, ENGINEERING, AND ENVIRONMENT (TIME-E 2014), 2014, : 252 - 257