Deep learning for missing value imputation of continuous data and the effect of data discretization

被引:50
|
作者
Lin, Wei-Chao [1 ,2 ]
Tsai, Chih-Fong [3 ]
Zhong, Jia Rong [3 ]
机构
[1] Chang Gung Univ, Dept Informat Management, Taoyuan, Taiwan
[2] Chang Gung Mem Hosp Linkou, Dept Thorac Surg, Taoyuan, Taiwan
[3] Natl Cent Univ, Dept Informat Management, Taoyuan, Taiwan
关键词
Data science; Machine learning; Deep learning; Missing value imputation; Data discretization; CLASSIFICATION; MACHINES;
D O I
10.1016/j.knosys.2021.108079
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Often real-world datasets are incomplete and contain some missing attribute values. Furthermore, many data mining and machine learning techniques cannot directly handle incomplete datasets. Missing value imputation is the major solution for constructing a learning model to estimate specific values to replace the missing ones. Deep learning techniques have been employed for missing value imputation and demonstrated their superiority over many other well-known imputation methods. However, very few studies have attempted to assess the imputation performance of deep learning techniques for tabular or structured data with continuous values. Moreover, the effect on the imputation results when the continuous data need to be discretized has never been examined. In this paper, two supervised deep neural networks, i.e., multilayer perceptron (MLP) and deep belief networks (DBN), are compared for missing value imputation. Moreover, two differently ordered combinations of data discretization and imputation steps are examined. The results show that MLP and DBN significantly outperform the baseline imputation methods based on the mean, KNN, CART, and SVM, with DBN performing the best. On the other hand, when considering the discretization of continuous data, the order in which the two steps are combined is not the most important, but rather, the chosen imputation algorithm. That is, the final performance is much better when using DBN for imputation, regardless of whether discretization is performed in the first or second step, than the other imputation methods.(c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Combining data discretization and missing value imputation for incomplete medical datasets
    Huang, Min-Wei
    Tsai, Chih-Fong
    Tsui, Shu-Ching
    Lin, Wei-Chao
    [J]. PLOS ONE, 2023, 18 (11):
  • [2] "Deep" Learning for Missing Value Imputation in Tables with Non-Numerical Data
    Biessmann, Felix
    Salinas, David
    Schelter, Sebastian
    Schmidt, Philipp
    Lange, Dustin
    [J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 2017 - 2025
  • [3] Adaptive Deep Incremental Learning - Assisted Missing Data Imputation for Streaming Data
    Syavasya, C. V. S. R.
    Lakshmi, M. A.
    [J]. JOURNAL OF INTERCONNECTION NETWORKS, 2022, 22 (SUPP02)
  • [4] Missing Data Imputation for Supervised Learning
    Poulos, Jason
    Valle, Rafael
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2018, 32 (02) : 186 - 196
  • [5] Missing-Value Imputation of Continuous Missing Based on Deep Imputation Network Using Correlations among Multiple IoT Data Streams in a Smart Space
    Lee, Minseok
    An, Jihoon
    Lee, Younghee
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (02) : 289 - 298
  • [6] Imputation of continuous missing values in profile data
    Yang, Luo
    Wang, Kaibo
    [J]. QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2022, 38 (07) : 3644 - 3662
  • [7] Missing value imputation strategies for metabolomics data
    Grace Armitage, Emily
    Godzien, Joanna
    Alonso-Herranz, Vanesa
    Lopez-Gonzalvez, Angeles
    Barbas, Coral
    [J]. ELECTROPHORESIS, 2015, 36 (24) : 3050 - 3060
  • [8] Missing Value Imputation: With Application to Handwriting Data
    Xu, Zhen
    Srihari, Sargur N.
    [J]. DOCUMENT RECOGNITION AND RETRIEVAL XXII, 2015, 9402
  • [9] Missing data in bioarchaeology II: A test of ordinal and continuous data imputation
    Wissler, Amanda
    Blevins, Kelly E.
    Buikstra, Jane E.
    [J]. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY, 2022, 179 (03): : 349 - 364
  • [10] Improved generative adversarial network with deep metric learning for missing data imputation
    Al-taezi, Mohammed Ali
    Wang, Yu
    Zhu, Pengfei
    Hu, Qinghua
    Al-badwi, Abdulrahman
    [J]. NEUROCOMPUTING, 2024, 570