SICE: an improved missing data imputation technique

被引:0
|
作者
Shahidul Islam Khan
Abu Sayed Md Latiful Hoque
机构
[1] Bangladesh University of Engineering and Technology,Department of CSE
[2] International Islamic University Chittagong,Department of CSE
来源
关键词
Missing Data Imputation; Single Imputation; Multiple Imputation; MICE; Data Analytics;
D O I
暂无
中图分类号
学科分类号
摘要
In data analytics, missing data is a factor that degrades performance. Incorrect imputation of missing values could lead to a wrong prediction. In this era of big data, when a massive volume of data is generated in every second, and utilization of these data is a major concern to the stakeholders, efficiently handling missing values becomes more important. In this paper, we have proposed a new technique for missing data imputation, which is a hybrid approach of single and multiple imputation techniques. We have proposed an extension of popular Multivariate Imputation by Chained Equation (MICE) algorithm in two variations to impute categorical and numeric data. We have also implemented twelve existing algorithms to impute binary, ordinal, and numeric missing values. We have collected sixty-five thousand real health records from different hospitals and diagnostic centers of Bangladesh, maintaining the privacy of data. We have also collected three public datasets from the UCI Machine Learning Repository, ETH Zurich, and Kaggle. We have compared the performance of our proposed algorithms with existing algorithms using these datasets. Experimental results show that our proposed algorithm achieves 20% higher F-measure for binary data imputation and 11% less error for numeric data imputations than its competitors with similar execution time.
引用
收藏
相关论文
共 50 条
  • [32] Improved imputation methods for missing data in two-occasion successive sampling
    Singh, Garib Nath
    Jaiswal, Ashok Kumar
    Pandey, Awadhesh K.
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2023, 52 (06) : 2010 - 2029
  • [33] Missing data imputation for traffic flow based on improved local least squares
    Chang, Gang
    Zhang, Yi
    Yao, Danya
    [J]. Tsinghua Science and Technology, 2012, 17 (03) : 304 - 309
  • [34] Fuzzy-based missing value imputation technique for air pollution data
    Mustafi, Ayon
    Middya, Asif Iqbal
    Roy, Sarbani
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (02) : 1 - 38
  • [35] Fuzzy-based missing value imputation technique for air pollution data
    Ayon Mustafi
    Asif Iqbal Middya
    Sarbani Roy
    [J]. Artificial Intelligence Review, 2023, 56 : 1 - 38
  • [36] From Missing Data Imputation to Data Generation
    Neves, Diogo Telmo
    Alves, Joao
    Naik, Marcel Ganesh
    Proenca, Alberto Jose
    Prasser, Fabian
    [J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2022, 61
  • [37] An Exponential-Cum-Sine-Type Hybrid Imputation Technique for Missing Data
    Bhattacharyya, D.
    Singh, G. N.
    Jawa, Taghreed M.
    Sayed-Ahmed, Neveen
    Pandey, Awadhesh K.
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [38] A Technique of Recursive Reliability-Based Missing Data Imputation for Collaborative Filtering
    Ihm, Sun-Young
    Lee, Shin-Eun
    Park, Young-Ho
    Nasridinov, Aziz
    Kim, Miyeon
    Park, So-Hyun
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (08):
  • [39] Data variability in the imputation quality of missing data
    Stochero, Elisandra Lucia Moro
    Lucio, Alessandro Dal'Col
    Jacobi, Luciane Flores
    [J]. ACTA SCIENTIARUM-AGRONOMY, 2024, 46
  • [40] Influence of Data Distribution in Missing Data Imputation
    Santos, Miriam Seoane
    Soares, Jastin Pompeu
    Abreu, Pedro Henriques
    Araujo, Helder
    Santos, Joao
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, AIME 2017, 2017, 10259 : 285 - 294