Analysis of Machine Learning Based Imputation of Missing Data

被引:1
|
作者
Rizvi, Syed Tahir Hussain [1 ]
Latif, Muhammad Yasir [2 ]
Amin, Muhammad Saad [3 ]
Telmoudi, Achraf Jabeur [4 ]
Shah, Nasir Ali [5 ]
机构
[1] Univ Stavanger, Dept Elect Engn & Comp Sci, Stavanger, Norway
[2] Educat Inc, Islamabad, Pakistan
[3] Univ Torino, Dipartimento Informat, Turin, Italy
[4] Univ Tunis, Natl Higher Engn Sch Tunis ENSIT, LISIER Lab, Tunis, Tunisia
[5] Politecn Torino, Dipartimento Elettron & Telecomunicazioni, Turin, Italy
关键词
Imputation; imputation using KNN; imputation using SKNN; missing data; statistical imputation; PREVENTION;
D O I
10.1080/01969722.2023.2247257
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data analysis and classification can be affected by the availability of missing data in datasets. To deal with missing data, either deletion- or imputation-based methods are used that result in the reduction of data records or imputation of incorrect predicted value. Quality of imputed data can be significantly improved if missing values are generated accurately using machine learning algorithms. In this work, an analysis of machine learning-based algorithms for missing data imputation is performed. The K-nearest neighbors (KNN) and Sequential KNN (SKNN) algorithms are used to impute missing values in datasets using machine learning. Missing values handled using a statistical deletion approach (List-wise Deletion (LD)) and ML-based imputation methods (KNN and SKNN) are then tested and compared using different ML classifiers (Support Vector Machine and Decision Tree) to evaluate the effectiveness of imputed data. The used algorithms are compared in terms of accuracy, and results yielded that the ML-based imputation method (SKNN) outperforms the LD-based approach and KNN method in terms of the effectiveness of handling missing data in almost every dataset with both classification algorithms (SVM and DT).
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Machine Learning Based Missing Data Imputation in Categorical Datasets
    Ishaq, Muhammad
    Zahir, Sana
    Iftikhar, Laila
    Bulbul, Mohammad Farhad
    Rho, Seungmin
    Lee, Mi Young
    [J]. IEEE ACCESS, 2024, 12 : 88332 - 88344
  • [2] Missing Data Imputation using Machine Learning Algorithm for Supervised Learning
    Cenitta, D.
    Arjunan, R. Vijaya
    Prema, K., V
    [J]. 2021 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2021,
  • [3] ExtraImpute: A Novel Machine Learning Method for Missing Data Imputation
    Alabadla, Mustafa
    Sidi, Fatimah
    Ishak, Iskandar
    Ibrahim, Hamidah
    Affendey, Lilly Suriani
    Hamdan, Hazlina
    [J]. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (05) : 470 - 476
  • [4] Enhanced Application of Principal Component Analysis in Machine Learning for Imputation of Missing Traffic Data
    Choi, Yoon-Young
    Shon, Heeseung
    Byon, Young-Ji
    Kim, Dong-Kyu
    Kang, Seungmo
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (10):
  • [5] An Imputation Method for Missing Data Based on an Extreme Learning Machine Auto-Encoder
    Lu, Cheng-Bo
    Mei, Ying
    [J]. IEEE ACCESS, 2018, 6 : 52930 - 52935
  • [6] MULTIPLE IMPUTATION AS A MISSING DATA MACHINE
    BRAND, J
    VANBUUREN, S
    VANMULLIGEN, EM
    TIMMERS, T
    GELSEMA, E
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, : 303 - 306
  • [7] Performance Analysis of Machine Learning Algorithms for Missing Value Imputation
    Abidin, Nadzurah Zainal
    Ismail, Amelia Ritahani
    Emran, Nurul A.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (06) : 442 - 447
  • [8] Sharpening the BLADE: Missing Data Imputation Using Supervised Machine Learning
    Suresh, Marcus
    Taib, Ronnie
    Zhao, Yanchang
    Jin, Warren
    [J]. AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11919 : 215 - 227
  • [9] Evaluation of Machine Learning Classification Algorithms & Missing Data Imputation Techniques
    Nwulu, Nnamdi I.
    [J]. 2017 INTERNATIONAL ARTIFICIAL INTELLIGENCE AND DATA PROCESSING SYMPOSIUM (IDAP), 2017,
  • [10] Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data
    Betancourt, Clara
    Li, Cathy W. Y.
    Kleinert, Felix
    Schultz, Martin G.
    [J]. ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2023, 57 (46) : 18246 - 18258