SICE: an improved missing data imputation technique

被引:81
|
作者
Khan, Shahidul Islam [1 ,2 ]
Hoque, Abu Sayed Md Latiful [1 ]
机构
[1] Bangladesh Univ Engn & Technol, Dept CSE, Dhaka, Bangladesh
[2] Int Islamic Univ Chittagong, Dept CSE, Chittagong, Bangladesh
关键词
Missing Data Imputation; Single Imputation; Multiple Imputation; MICE; Data Analytics; REGRESSION;
D O I
10.1186/s40537-020-00313-w
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In data analytics, missing data is a factor that degrades performance. Incorrect imputation of missing values could lead to a wrong prediction. In this era of big data, when a massive volume of data is generated in every second, and utilization of these data is a major concern to the stakeholders, efficiently handling missing values becomes more important. In this paper, we have proposed a new technique for missing data imputation, which is a hybrid approach of single and multiple imputation techniques. We have proposed an extension of popular Multivariate Imputation by Chained Equation (MICE) algorithm in two variations to impute categorical and numeric data. We have also implemented twelve existing algorithms to impute binary, ordinal, and numeric missing values. We have collected sixty-five thousand real health records from different hospitals and diagnostic centers of Bangladesh, maintaining the privacy of data. We have also collected three public datasets from the UCI Machine Learning Repository, ETH Zurich, and Kaggle. We have compared the performance of our proposed algorithms with existing algorithms using these datasets. Experimental results show that our proposed algorithm achieves 20% higher F-measure for binary data imputation and 11% less error for numeric data imputations than its competitors with similar execution time.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] SICE: an improved missing data imputation technique
    Shahidul Islam Khan
    Abu Sayed Md Latiful Hoque
    [J]. Journal of Big Data, 7
  • [2] Testing an improved method for missing data imputation
    Luo, PY
    Succop, PA
    [J]. AMERICAN STATISTICAL ASSOCIATION - 1996 PROCEEDINGS OF THE SECTION ON STATISTICS AND THE ENVIRONMENT, 1996, : 81 - 86
  • [3] An Imputation Technique for Missing Data in Propagation Measurements
    Cheng, Lin
    [J]. 2010 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM, 2010,
  • [4] Improved generative adversarial imputation networks for missing data
    Qin, Xiwen
    Shi, Hongyu
    Dong, Xiaogang
    Zhang, Siqi
    Yuan, Liping
    [J]. APPLIED INTELLIGENCE, 2024, 54 (21) : 11068 - 11082
  • [5] A Modified Imputation Method to Missing Data as a Preprocessing Technique
    Caparino, Elenita T.
    Sison, Ariel M.
    Medina, Ruji P.
    [J]. 2018 IEEE 10TH INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2018,
  • [6] IMPUTATION OF MISSING DATA
    Lunt, M.
    [J]. ANNALS OF THE RHEUMATIC DISEASES, 2014, 73 : 49 - 49
  • [7] Improved methods for the imputation of missing data by nearest neighbor methods
    Tutz, Gerhard
    Ramzan, Shahla
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 90 : 84 - 99
  • [8] Improved KNN Imputation for Missing Values in Gene Expression Data
    Keerin, Phimmarin
    Boongoen, Tossapon
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (02): : 4009 - 4025
  • [9] Missing Data Imputation Using Ensemble Learning Technique: A Review
    Jegadeeswari, K.
    Ragunath, R.
    Rathipriya, R.
    [J]. SOFT COMPUTING FOR SECURITY APPLICATIONS, ICSCS 2022, 2023, 1428 : 223 - 236
  • [10] Missing Data Characteristics and the Choice of Imputation Technique: An Empirical Study
    Alade, Oyekale Abel
    Sallehuddin, Roselina
    Radzi, Nor Haizan Mohamed
    Selamat, Ali
    [J]. EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 88 - 97