Comparison of Missing Data Imputation Methods using the Framingham Heart study dataset

被引:3
|
作者
Psychogyios, Konstantinos [1 ]
Ilias, Loukas [1 ]
Askounis, Dimitris [1 ]
机构
[1] Natl Tech Univ Athens, Sch Elect & Comp Engn, Decis Support Syst Lab, Athens 15780, Greece
关键词
Cardiovascular disease prediction; Missing value imputation; Deep learning; Generative Adversarial Networks; Autoencoders;
D O I
10.1109/BHI56158.2022.9926882
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cardiovascular disease (CVD) is a class of diseases that involve the heart or blood vessels and according to World Health Organization is the leading cause of death worldwide. EHR data regarding this case, as well as medical cases in general, contain missing values very frequently. The percentage of missingness may vary and is linked with instrument errors, manual data entry procedures, etc. Even though the missing rate is usually significant, in many cases the missing value imputation part is handled poorly either with case-deletion or with simple statistical approaches such as mode and median imputation. These methods are known to introduce significant bias, since they do not account for the relationships between the dataset's variables. Within the medical framework, many datasets consist of lab tests or patient medical tests, where these relationships are present and strong. To address these limitations, in this paper we test and modify state-of-the-art missing value imputation methods based on Generative Adversarial Networks (GANs) and Autoencoders. The evaluation is accomplished for both the tasks of data imputation and post-imputation prediction. Regarding the imputation task, we achieve improvements of 0.20, 7.00% in normalised Root Mean Squared Error (RMSE) and Area Under the Receiver Operating Characteristic Curve (AUROC) respectively. In terms of the post-imputation prediction task, our models outperform the standard approaches by 2.50% in F1-score.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets
    JiaHang Li
    ShuXia Guo
    RuLin Ma
    Jia He
    XiangHui Zhang
    DongSheng Rui
    YuSong Ding
    Yu Li
    LeYao Jian
    Jing Cheng
    Heng Guo
    [J]. BMC Medical Research Methodology, 24
  • [22] ESTIMATION OF MISSING VALUES IN AIR POLLUTION DATASET BY USING VARIOUS IMPUTATION METHODS
    Sukatis, Fahren Fazzer
    Noor, Norazian Mohamed
    Zakaria, Nur Afiqah
    Ul-Saufie, Ahmad Zia
    Suwardi, Annas
    [J]. INTERNATIONAL JOURNAL OF CONSERVATION SCIENCE, 2019, 10 (04) : 791 - 804
  • [23] Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets
    Li, JiaHang
    Guo, ShuXia
    Ma, RuLin
    He, Jia
    Zhang, XiangHui
    Rui, DongSheng
    Ding, YuSong
    Li, Yu
    Jian, LeYao
    Cheng, Jing
    Guo, Heng
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
  • [24] Treating missing data in a clinical neuropsychological dataset -: Data imputation
    Närhi, V
    Laaksonen, S
    Hietala, R
    Ahonen, T
    Lyyti, H
    [J]. CLINICAL NEUROPSYCHOLOGIST, 2001, 15 (03): : 380 - 392
  • [25] Comparison of imputation and imputation-free methods for statistical analysis of mass spectrometry data with missing data
    Taylor, Sandra
    Ponzini, Matthew
    Wilson, Machelle
    Kim, Kyoungmi
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [26] Missing data imputation using fuzzy-rough methods
    Amiri, Mehran
    Jensen, Richard
    [J]. NEUROCOMPUTING, 2016, 205 : 152 - 164
  • [27] A Comparison of Various Imputation Methods for Missing Values in Air Quality Data
    Zainuri, Nuryazmin Ahmat
    Jemain, Abdul Aziz
    Muda, Nora
    [J]. SAINS MALAYSIANA, 2015, 44 (03): : 449 - 456
  • [28] Comparison of Estimation Methods for Missing Value Imputation of Gene Expression Data
    Sarikas, Ali
    Odabasioglu, Niyazi
    Altay, Gokmen
    [J]. 2016 MEDICAL TECHNOLOGIES NATIONAL CONFERENCE (TIPTEKNO), 2015,
  • [29] A Comparison of Hot Deck Imputation and Substitution Methods in The Estimation of Missing Data
    Yesilova, Abdullah
    Kaya, Yilmaz
    Almali, M. Nuri
    [J]. GAZI UNIVERSITY JOURNAL OF SCIENCE, 2011, 24 (01): : 69 - 75
  • [30] Empirical Comparison of Imputation Methods for Multivariate Missing Data in Public Health
    Pan, Steven
    Chen, Sixia
    [J]. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2023, 20 (02)