Analysis of Machine Learning Based Imputation of Missing Data

被引:1
|
作者
Rizvi, Syed Tahir Hussain [1 ]
Latif, Muhammad Yasir [2 ]
Amin, Muhammad Saad [3 ]
Telmoudi, Achraf Jabeur [4 ]
Shah, Nasir Ali [5 ]
机构
[1] Univ Stavanger, Dept Elect Engn & Comp Sci, Stavanger, Norway
[2] Educat Inc, Islamabad, Pakistan
[3] Univ Torino, Dipartimento Informat, Turin, Italy
[4] Univ Tunis, Natl Higher Engn Sch Tunis ENSIT, LISIER Lab, Tunis, Tunisia
[5] Politecn Torino, Dipartimento Elettron & Telecomunicazioni, Turin, Italy
关键词
Imputation; imputation using KNN; imputation using SKNN; missing data; statistical imputation; PREVENTION;
D O I
10.1080/01969722.2023.2247257
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data analysis and classification can be affected by the availability of missing data in datasets. To deal with missing data, either deletion- or imputation-based methods are used that result in the reduction of data records or imputation of incorrect predicted value. Quality of imputed data can be significantly improved if missing values are generated accurately using machine learning algorithms. In this work, an analysis of machine learning-based algorithms for missing data imputation is performed. The K-nearest neighbors (KNN) and Sequential KNN (SKNN) algorithms are used to impute missing values in datasets using machine learning. Missing values handled using a statistical deletion approach (List-wise Deletion (LD)) and ML-based imputation methods (KNN and SKNN) are then tested and compared using different ML classifiers (Support Vector Machine and Decision Tree) to evaluate the effectiveness of imputed data. The used algorithms are compared in terms of accuracy, and results yielded that the ML-based imputation method (SKNN) outperforms the LD-based approach and KNN method in terms of the effectiveness of handling missing data in almost every dataset with both classification algorithms (SVM and DT).
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Missing Values and Imputation in Healthcare Data: Can Interpretable Machine Learning Help?
    Chen, Zhi
    Tan, Sarah
    Chajewska, Urszula
    Rudin, Cynthia
    Caruana, Rich
    [J]. CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 209, 2023, 209 : 86 - 99
  • [22] Prediction of concrete strengths enabled by missing data imputation and interpretable machine learning
    Lyngdoh, Gideon A.
    Zaki, Mohd
    Krishnan, N. M. Anoop
    Das, Sumanta
    [J]. CEMENT & CONCRETE COMPOSITES, 2022, 128
  • [23] Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data
    Li, Cong
    Ren, Xupeng
    Zhao, Guohui
    [J]. ALGORITHMS, 2023, 16 (09)
  • [24] Machine learning imputation of missing Mesonet temperature observations
    Boomgard-Zagrodnik, Joseph P.
    Brown, David J.
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2022, 192
  • [25] A systematic review of machine learning-based missing value imputation techniques
    Thomas, Tressy
    Rajabi, Enayat
    [J]. DATA TECHNOLOGIES AND APPLICATIONS, 2021, 55 (04) : 558 - 585
  • [26] A Novel Index Measure Imputation Algorithm for Missing Data Values: A Machine Learning Approach
    Madhu, G.
    Rajinikanth, T. V.
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2012, : 81 - 87
  • [27] Modulo 9 model-based learning for missing data imputation
    Ngueilbaye, Alladoumbaye
    Wang, Hongzhi
    Mahamat, Daouda Ahmat
    Junaidu, Sahalu B.
    [J]. APPLIED SOFT COMPUTING, 2021, 103
  • [28] Missing phenotype data imputation in pedigree data analysis
    Fridley, B
    de Andrade, M
    [J]. GENETIC EPIDEMIOLOGY, 2005, 29 (03) : 249 - 249
  • [29] Missing value imputation using unsupervised machine learning techniques
    Raja, P. S.
    Thangavel, K.
    [J]. SOFT COMPUTING, 2020, 24 (06) : 4361 - 4392
  • [30] Missing value imputation using unsupervised machine learning techniques
    P. S. Raja
    K. Thangavel
    [J]. Soft Computing, 2020, 24 : 4361 - 4392