Analysis of Machine Learning Based Imputation of Missing Data

被引:1
|
作者
Rizvi, Syed Tahir Hussain [1 ]
Latif, Muhammad Yasir [2 ]
Amin, Muhammad Saad [3 ]
Telmoudi, Achraf Jabeur [4 ]
Shah, Nasir Ali [5 ]
机构
[1] Univ Stavanger, Dept Elect Engn & Comp Sci, Stavanger, Norway
[2] Educat Inc, Islamabad, Pakistan
[3] Univ Torino, Dipartimento Informat, Turin, Italy
[4] Univ Tunis, Natl Higher Engn Sch Tunis ENSIT, LISIER Lab, Tunis, Tunisia
[5] Politecn Torino, Dipartimento Elettron & Telecomunicazioni, Turin, Italy
关键词
Imputation; imputation using KNN; imputation using SKNN; missing data; statistical imputation; PREVENTION;
D O I
10.1080/01969722.2023.2247257
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data analysis and classification can be affected by the availability of missing data in datasets. To deal with missing data, either deletion- or imputation-based methods are used that result in the reduction of data records or imputation of incorrect predicted value. Quality of imputed data can be significantly improved if missing values are generated accurately using machine learning algorithms. In this work, an analysis of machine learning-based algorithms for missing data imputation is performed. The K-nearest neighbors (KNN) and Sequential KNN (SKNN) algorithms are used to impute missing values in datasets using machine learning. Missing values handled using a statistical deletion approach (List-wise Deletion (LD)) and ML-based imputation methods (KNN and SKNN) are then tested and compared using different ML classifiers (Support Vector Machine and Decision Tree) to evaluate the effectiveness of imputed data. The used algorithms are compared in terms of accuracy, and results yielded that the ML-based imputation method (SKNN) outperforms the LD-based approach and KNN method in terms of the effectiveness of handling missing data in almost every dataset with both classification algorithms (SVM and DT).
引用
收藏
页数:15
相关论文
共 50 条
  • [41] IMPUTATION OF MISSING DATA
    Lunt, M.
    [J]. ANNALS OF THE RHEUMATIC DISEASES, 2014, 73 : 49 - 49
  • [42] Parametric fractional imputation for missing data analysis
    Kim, Jae Kwang
    [J]. BIOMETRIKA, 2011, 98 (01) : 119 - 132
  • [43] Analysis of Suitable Machine Learning Imputation Techniques for Arthritis Profile Data
    Ramasamy, Uma
    Santhoshkumar, Sundar
    [J]. IETE JOURNAL OF RESEARCH, 2024, 70 (01) : 334 - 355
  • [44] Missing data imputation of MAGDAS-9′s ground electromagnetism with supervised machine learning and conventional statistical analysis models
    Asraf, Muhammad H.
    Dalila, Nur K. A.
    Tahir, Nooritawati Md
    Abd Latiff, Zatul Iffah
    Jusoh, Mohamad Huzaimy
    Akimasa, Yoshikawa
    [J]. ALEXANDRIA ENGINEERING JOURNAL, 2022, 61 (01) : 937 - 947
  • [45] Imputation of missing sub-hourly precipitation data in a large sensor network: A machine learning approach
    Chivers, Benedict D.
    Wallbank, John
    Cole, Steven J.
    Sebek, Ondrej
    Stanley, Simon
    Fry, Matthew
    Leontidis, Georgios
    [J]. JOURNAL OF HYDROLOGY, 2020, 588
  • [46] The impact of imputation quality on machine learning classifiers for datasets with missing values
    Tolou Shadbahr
    Michael Roberts
    Jan Stanczuk
    Julian Gilbey
    Philip Teare
    Sören Dittmer
    Matthew Thorpe
    Ramon Viñas Torné
    Evis Sala
    Pietro Lió
    Mishal Patel
    Jacobus Preller
    James H. F. Rudd
    Tuomas Mirtti
    Antti Sakari Rannikko
    John A. D. Aston
    Jing Tang
    Carola-Bibiane Schönlieb
    [J]. Communications Medicine, 3
  • [47] The impact of imputation quality on machine learning classifiers for datasets with missing values
    Shadbahr, Tolou
    Roberts, Michael
    Stanczuk, Jan
    Gilbey, Julian
    Teare, Philip
    Dittmer, Soeren
    Thorpe, Matthew
    Torne, Ramon Vinas
    Sala, Evis
    Lio, Pietro
    Patel, Mishal
    Preller, Jacobus
    Rudd, James H. F.
    Mirtti, Tuomas
    Rannikko, Antti Sakari
    Aston, John A. D.
    Tang, Jing
    Schonlieb, Carola-Bibiane
    [J]. COMMUNICATIONS MEDICINE, 2023, 3 (01):
  • [48] Learning-Based Adaptive Imputation Method with kNN Algorithm for Missing Power Data
    Kim, Minkyung
    Park, Sangdon
    Lee, Joohyung
    Joo, Yongjae
    Choi, Jun Kyun
    [J]. ENERGIES, 2017, 10 (10)
  • [49] MIDA: a Web Tool for MIssing DAta Imputation based on a Boosted and Incremental Learning Algorithm
    Acampora, Giovanni
    Vitiello, Autilia
    Siciliano, Roberta
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2020,
  • [50] A comparative analysis of missing data imputation techniques on sedimentation data
    Loh, Wing Son
    Ling, Lloyd
    Chin, Ren Jie
    Lai, Sai Hin
    Loo, Kar Kuan
    Sen Seah, Choon
    [J]. AIN SHAMS ENGINEERING JOURNAL, 2024, 15 (06)