Missing Value Imputation for Diabetes Prediction

被引:2
|
作者
Luo, Fei [1 ]
Qian, Hangwei [1 ]
Wang, Di [1 ]
Guo, Xu [1 ,2 ]
Sun, Yan [3 ]
Lee, Eng Sing [4 ]
Teong, Hui Hwang [5 ]
Lai, Ray Tian Rui [5 ]
Miao, Chunyan [1 ,2 ]
机构
[1] Nanyang Technol Univ, Joint NTU UBC Res Ctr Excellence Act Living Elder, Singapore, Singapore
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[3] Natl Healthcare Grp, Hlth Serv & Outcomes Res Dept, Singapore, Singapore
[4] Natl Healthcare Grp Polyclin, Singapore, Singapore
[5] Tan Tock Seng Hosp, Singapore, Singapore
基金
新加坡国家研究基金会;
关键词
diabetes-related dataset; diabetes prediction; missing values; data imputation techniques; CHAINED EQUATIONS; SYSTEM;
D O I
10.1109/IJCNN55064.2022.9892398
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) models have been widely used to improve the accuracy and efficiency of various types of disease diagnostic tasks. However, it is still challenging to apply ML models to perform diabetes-related prediction tasks mainly because patients' health records are sparse and have a vast amount of missing values. Missing values often break the diabetes prediction pipelines, posing challenges to existing approaches. Such problem deteriorates significantly when critical attribute values (e.g., blood test results on HbA1c, FPG and OGTT2hr) are missing. In this paper, we introduce a large-scale diabetesrelated dataset named Chronic Disease Management System (CDMS) dataset, which collects the clinical records of more than 700,000 visits of over 65,000 patients across eight years. CDMS is anonymously collected and has a high percentage of missing values on several critical attributes for diabetes prediction. If not being dealt with carefully, the missing values will cause significant performance degradation of the applied ML models. In this paper, we also investigate the effectiveness of multiple data imputation methods through conducting extensive experiments using CDMS. Experimental results show that k-Nearest Neighbor Imputation (KNNI) performs better than other methods in this diabetes prediction task. Specifically, with KNNI applied, the diabetes prediction accuracy and precision are both over 0.8 using various ML predictive models.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Hybrid prediction model with missing value imputation for medical data
    Purwar, Archana
    Singh, Sandeep Kumar
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (13) : 5621 - 5631
  • [2] Futuristic Prediction of Missing Value Imputation Methods Using Extended ANN
    Tripathi, Ashok Kumar
    Saini, Hemraj
    Rathee, Geetanjali
    [J]. INTERNATIONAL JOURNAL OF BUSINESS ANALYTICS, 2022, 9 (03)
  • [3] Missing value imputation and the effect of feature normalisation on financial distress prediction
    Sue, Kuen-Liang
    Tsai, Chih-Fong
    Tsau, Hau-Min
    [J]. JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2022,
  • [4] A novel hybrid intelligent system with missing value imputation for diabetes diagnosis
    Ramezani, Rohollah
    Maadi, Mansoureh
    Khatami, Seyedeh Malihe
    [J]. ALEXANDRIA ENGINEERING JOURNAL, 2018, 57 (03) : 1883 - 1891
  • [5] MTSSP: Missing Value Imputation in Multivariate Time Series for Survival Prediction
    Li, Bo
    Shi, Yuliang
    Cheng, Lin
    Yan, Zhongmin
    Wang, Xinjun
    Li, Hui
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [6] Gaussian processes for missing value imputation
    Jafrasteh, Bahram
    Hernandez-Lobato, Daniel
    Lubian-Lopez, Simon Pedro
    Benavente-Fernandez, Isabel
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 273
  • [7] Missing value imputation for epistatic MAPs
    Ryan, Colm
    Greene, Derek
    Cagney, Gerard
    Cunningham, Padraig
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [8] Missing value imputation for epistatic MAPs
    Colm Ryan
    Derek Greene
    Gerard Cagney
    Pádraig Cunningham
    [J]. BMC Bioinformatics, 11
  • [9] A hybrid method for missing value imputation
    Karanikola, Aikaterini
    Kotsiantis, Sotiris
    [J]. PROCEEDINGS OF THE 23RD PAN-HELLENIC CONFERENCE OF INFORMATICS (PCI 2019), 2019, : 74 - 79
  • [10] Missing value imputation framework for microarray significant gene selection and class prediction
    Sehgal, Muhammad Shoaib B.
    Gondal, Iqbal
    Dooley, Laurence
    [J]. DATA MINING FOR BIOMEDICAL APPLICATIONS, PROCEEDINGS, 2006, 3916 : 131 - 142