Impact of missing data imputation methods on gene expression clustering and classification

被引:68
|
作者
de Souto, Marcilio C. P. [1 ]
Jaskowiak, Pablo A. [2 ]
Costa, Ivan G. [3 ,4 ]
机构
[1] Univ Orleans, INSA Ctr Val Loire, LIFO EA 4022, Orleans, France
[2] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, SP, Brazil
[3] Univ Fed Pernambuco, Ctr Informat, Recife, PE, Brazil
[4] Rhein Westfal TH Aachen, Sch Med, Inst Biomed Engn, IZKF Computat Biol Res Grp, Aachen, Germany
来源
BMC BIOINFORMATICS | 2015年 / 16卷
基金
巴西圣保罗研究基金会;
关键词
Missing data; Imputation; Clustering; Classification; Gene expression;
D O I
10.1186/s12859-015-0494-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Several missing value imputation methods for gene expression data have been proposed in the literature. In the past few years, researchers have been putting a great deal of effort into presenting systematic evaluations of the different imputation algorithms. Initially, most algorithms were assessed with an emphasis on the accuracy of the imputation, using metrics such as the root mean squared error. However, it has become clear that the success of the estimation of the expression value should be evaluated in more practical terms as well. One can consider, for example, the ability of the method to preserve the significant genes in the dataset, or its discriminative/predictive power for classification/clustering purposes. Results and conclusions: We performed a broad analysis of the impact of five well-known missing value imputation methods on three clustering and four classification methods, in the context of 12 cancer gene expression datasets. We employed a statistical framework, for the first time in this field, to assess whether different imputation methods improve the performance of the clustering/classification methods. Our results suggest that the imputation methods evaluated have a minor impact on the classification and downstream clustering analyses. Simple methods such as replacing the missing values by mean or the median values performed as well as more complex strategies. The datasets analyzed in this study are available at http://costalab.org/Imputation/.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Partial distance evidential clustering for missing data with multiple imputation
    Tian, Hong-Peng
    Zhang, Zhen
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [42] Comparison of missing data imputation methods using weather data
    Nida, Hafiza
    Kashif, Muhammad
    Khan, Muhammad Imran
    Ghamkhar, Madiha
    PAKISTAN JOURNAL OF AGRICULTURAL SCIENCES, 2023, 60 (02): : 327 - 336
  • [43] A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction
    Hu, Zhiyong
    Du, Dongping
    PLOS ONE, 2020, 15 (09):
  • [44] OVERCOMING MISSING VALUES USING IMPUTATION METHODS IN THE CLASSIFICATION OF TUBERCULOSIS
    Rochman, Eka Mala Sari
    Miswanto
    Suprajitno, Herry
    COMMUNICATIONS IN MATHEMATICAL BIOLOGY AND NEUROSCIENCE, 2022,
  • [45] A comparison of imputation methods for the consecutive missing temperature data
    Kim, Hee-Kyung
    Kang, In-Kyeong
    Lee, Jae-Won
    Lee, Yung-Seop
    KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (03) : 549 - 557
  • [46] Some imputation methods for missing data in sample surveys
    Singh, G. N.
    Maurya, S.
    Khetan, M.
    Kadilar, Cem
    Hacettepe Journal of Mathematics and Statistics, 2016, 45 (06): : 1865 - 1880
  • [47] Ensemble imputation methods for missing software engineering data
    Twala, B
    Cartwright, M
    2005 11TH INTERNATIONAL SYMPOSIUM ON SOFTWARE METRICS (METRICS), 2005, : 268 - 277
  • [48] The Impact of Missing Data and Imputation Methods on the Analysis of 24-Hour Activity Patterns
    Weed, Lara
    Lok, Renske
    Chawra, Dwijen
    Zeitzer, Jamie
    CLOCKS & SLEEP, 2022, 4 (04): : 497 - 507
  • [49] Imputation methods for missing data in educational diagnostic evaluation
    Fernandez-Alonso, Ruben
    Suarez-Alvarez, Javier
    Muniz, Jose
    PSICOTHEMA, 2012, 24 (01) : 167 - 175
  • [50] Application and Comparison of Imputation Methods for Missing Degradation Data
    Fan, Ye
    Sun, Fuqiang
    Jiang, Tongmin
    ENGINEERING ASSET MANAGEMENT - SYSTEMS, PROFESSIONAL PRACTICES AND CERTIFICATION, 2015, : 1607 - 1614