Missing Data Imputation: A Survey

被引:2
|
作者
Kelkar, Bhagyashri Abhay [1 ]
机构
[1] KITs Coll Engn, Kolhapur, Maharashtra, India
关键词
CLUSLINK; High-Dimensional Data; Missing Data; Multiple Imputation; Subspace Clustering; MULTIPLE IMPUTATION; ALGORITHM; VALUES; SPARSE;
D O I
10.4018/IJDSST.292446
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many real-world datasets may contain missing values for various reasons. These incomplete datasets can pose severe issues to the underlying machine learning algorithms and decision support systems. It may result in high computational cost, skewed output, and invalid deductions. Various solutions exist to mitigate this issue; the most popular strategy is to estimate the missing values by applying inferential techniques such as linear regression, decision trees, or Bayesian inference. In this paper, the missing data problem is discussed in detail with a comprehensive review of the approaches to tackle it. The paper concludes with a discussion on the effectiveness of three imputation methods, namely imputation based on multiple linear regression (MLR), predictive mean matching (PMM), and classification and regression tree (CART), in the context of subspace clustering. The experimental results obtained on real benchmark datasets and high-dimensional synthetic datasets highlight that MLR-based imputation method is more efficient on high-dimensional incomplete datasets.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Multiple imputation of missing data for survey data analysis
    Lupo, Coralie
    Le Bouquin, Sophie
    Michel, Virginie
    Colin, Pierre
    Chauvin, Claire
    [J]. EPIDEMIOLOGIE ET SANTE ANIMALE, 2008, NO 53, 2008, (53): : 73 - 83
  • [2] An Experimental Survey of Missing Data Imputation Algorithms
    Miao, Xiaoye
    Wu, Yangyang
    Chen, Lu
    Gao, Yunjun
    Yin, Jianwei
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 6630 - 6650
  • [3] A Comprehensive Survey on Imputation of Missing Data in Internet of Things
    Adhikari, Deepak
    Jiang, Wei
    Zhan, Jinyu
    He, Zhiyuan
    Rawat, Danda B.
    Aickelin, Uwe
    Khorshidi, Hadi A.
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (07)
  • [4] Inference for domains under imputation for missing survey data
    Haziza, D
    Rao, JNK
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2005, 33 (02): : 149 - 161
  • [5] IMPUTATION OF MISSING DATA
    Lunt, M.
    [J]. ANNALS OF THE RHEUMATIC DISEASES, 2014, 73 : 49 - 49
  • [6] Weighting and Imputation for Missing Data in a Cost and Earnings Fishery Survey
    Lew, Daniel K.
    Himes-Cornell, Amber
    Lee, Jean
    [J]. MARINE RESOURCE ECONOMICS, 2015, 30 (02) : 219 - 230
  • [7] Missing data imputation: focusing on single imputation
    Zhang, Zhongheng
    [J]. ANNALS OF TRANSLATIONAL MEDICINE, 2016, 4 (01)
  • [8] Multiple imputation for survey data that are missing by design: A validation study.
    Yost, K
    Levine, R
    Gold, E
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2003, 157 (11) : S34 - S34
  • [9] Multiple imputation of missing income data in the National Health Interview Survey
    Schenker, Nathaniel
    Raghunathan, Trivellore E.
    Chiu, Pei-Lu
    Makuc, Diane M.
    Zhang, Guangyu
    Cohen, Alan J.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (475) : 924 - 933
  • [10] Multiple imputation to account for missing data in a survey: Estimating the prevalence of osteoporosis
    Kmetic, A
    Joseph, L
    Berger, C
    Tenenhouse, A
    [J]. EPIDEMIOLOGY, 2002, 13 (04) : 437 - 444