Missing Data Imputation: A Survey

被引:2
|
作者
Kelkar, Bhagyashri Abhay [1 ]
机构
[1] KITs Coll Engn, Kolhapur, Maharashtra, India
关键词
CLUSLINK; High-Dimensional Data; Missing Data; Multiple Imputation; Subspace Clustering; MULTIPLE IMPUTATION; ALGORITHM; VALUES; SPARSE;
D O I
10.4018/IJDSST.292446
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many real-world datasets may contain missing values for various reasons. These incomplete datasets can pose severe issues to the underlying machine learning algorithms and decision support systems. It may result in high computational cost, skewed output, and invalid deductions. Various solutions exist to mitigate this issue; the most popular strategy is to estimate the missing values by applying inferential techniques such as linear regression, decision trees, or Bayesian inference. In this paper, the missing data problem is discussed in detail with a comprehensive review of the approaches to tackle it. The paper concludes with a discussion on the effectiveness of three imputation methods, namely imputation based on multiple linear regression (MLR), predictive mean matching (PMM), and classification and regression tree (CART), in the context of subspace clustering. The experimental results obtained on real benchmark datasets and high-dimensional synthetic datasets highlight that MLR-based imputation method is more efficient on high-dimensional incomplete datasets.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] MISSING DATA, IMPUTATION AND REGRESSION TREES
    Loh, Wei-Yin
    Zhang, Qiong
    Zhang, Wenwen
    Zhou, Peigen
    [J]. STATISTICA SINICA, 2020, 30 (04) : 1697 - 1722
  • [42] Cooperative Clustering Missing Data Imputation
    Wan, Daoming
    Razavi-Far, Roozbeh
    Saif, Mehrdad
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 1039 - 1045
  • [43] Imputation of missing data in industrial databases
    Lakshminarayan, K
    Harp, SA
    Samad, T
    [J]. APPLIED INTELLIGENCE, 1999, 11 (03) : 259 - 275
  • [44] Multiple imputation for nonignorable missing data
    Jongho Im
    Soeun Kim
    [J]. Journal of the Korean Statistical Society, 2017, 46 : 583 - 592
  • [45] Missing phenotype data imputation in pedigree data analysis
    Fridley, B
    de Andrade, M
    [J]. GENETIC EPIDEMIOLOGY, 2005, 29 (03) : 249 - 249
  • [46] Missing phenotype data imputation in pedigree data analysis
    Fridley, Brooke L.
    de Andrade, Mariza
    [J]. GENETIC EPIDEMIOLOGY, 2008, 32 (01) : 52 - 60
  • [47] Missing Data Imputation with High-Dimensional Data
    Brini, Alberto
    van den Heuvel, Edwin R.
    [J]. AMERICAN STATISTICIAN, 2024, 78 (02): : 240 - 252
  • [48] Missing data imputation in multivariate data by evolutionary algorithms
    Figueroa Garcia, Juan C.
    Kalenatic, Dusko
    Lopez Bello, Cesar Amilcar
    [J]. COMPUTERS IN HUMAN BEHAVIOR, 2011, 27 (05) : 1468 - 1474
  • [49] Exploring the Effects of Data Distribution in Missing Data Imputation
    Soares, Jastin Pompeu
    Santos, Miriam Seoane
    Abreu, Pedro Henriques
    Araujo, Helder
    Santos, Joao
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS XVII, IDA 2018, 2018, 11191 : 251 - 263
  • [50] A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction
    Hu, Zhiyong
    Du, Dongping
    [J]. PLOS ONE, 2020, 15 (09):