Analysis of Machine Learning Based Imputation of Missing Data

被引:1
|
作者
Rizvi, Syed Tahir Hussain [1 ]
Latif, Muhammad Yasir [2 ]
Amin, Muhammad Saad [3 ]
Telmoudi, Achraf Jabeur [4 ]
Shah, Nasir Ali [5 ]
机构
[1] Univ Stavanger, Dept Elect Engn & Comp Sci, Stavanger, Norway
[2] Educat Inc, Islamabad, Pakistan
[3] Univ Torino, Dipartimento Informat, Turin, Italy
[4] Univ Tunis, Natl Higher Engn Sch Tunis ENSIT, LISIER Lab, Tunis, Tunisia
[5] Politecn Torino, Dipartimento Elettron & Telecomunicazioni, Turin, Italy
关键词
Imputation; imputation using KNN; imputation using SKNN; missing data; statistical imputation; PREVENTION;
D O I
10.1080/01969722.2023.2247257
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data analysis and classification can be affected by the availability of missing data in datasets. To deal with missing data, either deletion- or imputation-based methods are used that result in the reduction of data records or imputation of incorrect predicted value. Quality of imputed data can be significantly improved if missing values are generated accurately using machine learning algorithms. In this work, an analysis of machine learning-based algorithms for missing data imputation is performed. The K-nearest neighbors (KNN) and Sequential KNN (SKNN) algorithms are used to impute missing values in datasets using machine learning. Missing values handled using a statistical deletion approach (List-wise Deletion (LD)) and ML-based imputation methods (KNN and SKNN) are then tested and compared using different ML classifiers (Support Vector Machine and Decision Tree) to evaluate the effectiveness of imputed data. The used algorithms are compared in terms of accuracy, and results yielded that the ML-based imputation method (SKNN) outperforms the LD-based approach and KNN method in terms of the effectiveness of handling missing data in almost every dataset with both classification algorithms (SVM and DT).
引用
收藏
页数:15
相关论文
共 50 条
  • [31] A simulation study on missing data imputation for dichotomous variables using statistical and machine learning methods
    Yingfeng Ge
    Zhiwei Li
    Jinxin Zhang
    [J]. Scientific Reports, 13
  • [32] A simulation study on missing data imputation for dichotomous variables using statistical and machine learning methods
    Ge, Yingfeng
    Li, Zhiwei
    Zhang, Jinxin
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [33] A novel machine learning-based imputation strategy for missing data in step-stress accelerated degradation test
    Li, Yaqiu
    Zhou, Qijie
    Fan, Ye
    Pan, Guangze
    Dai, Zongbei
    Lei, Baimao
    [J]. HELIYON, 2024, 10 (04)
  • [34] Missing data imputation using statistical and machine learning methods in a real breast cancer problem
    Jerez, Jose M.
    Molina, Ignacio
    Garcia-Laencina, Pedro J.
    Alba, Emilio
    Ribelles, Nuria
    Martin, Miguel
    Franco, Leonardo
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2010, 50 (02) : 105 - 115
  • [35] Multiple imputation of missing data for survey data analysis
    Lupo, Coralie
    Le Bouquin, Sophie
    Michel, Virginie
    Colin, Pierre
    Chauvin, Claire
    [J]. EPIDEMIOLOGIE ET SANTE ANIMALE, 2008, NO 53, 2008, (53): : 73 - 83
  • [36] Missing Data Imputation Algorithm for Transmission Systems Based on Multivariate Imputation With Principal Component Analysis
    Sim, Yeon-Sub
    Hwang, Jae-Sang
    Mun, Sung-Duk
    Kim, Tae-Joon
    Chang, Seung Jin
    [J]. IEEE ACCESS, 2022, 10 : 83195 - 83203
  • [37] The use of multiple imputation for the analysis of missing data
    Sinharay, S
    Stern, HS
    Russell, D
    [J]. PSYCHOLOGICAL METHODS, 2001, 6 (04) : 317 - 329
  • [38] Regression multiple imputation for missing data analysis
    Yu, Lili
    Liu, Liang
    Peace, Karl E.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2020, 29 (09) : 2647 - 2664
  • [39] IMPUTATION OF MISSING DATA
    Lunt, M.
    [J]. ANNALS OF THE RHEUMATIC DISEASES, 2014, 73 : 49 - 49
  • [40] Parametric fractional imputation for missing data analysis
    Kim, Jae Kwang
    [J]. BIOMETRIKA, 2011, 98 (01) : 119 - 132