Comparison of Missing Data Imputation Methods using the Framingham Heart study dataset

被引:3
|
作者
Psychogyios, Konstantinos [1 ]
Ilias, Loukas [1 ]
Askounis, Dimitris [1 ]
机构
[1] Natl Tech Univ Athens, Sch Elect & Comp Engn, Decis Support Syst Lab, Athens 15780, Greece
关键词
Cardiovascular disease prediction; Missing value imputation; Deep learning; Generative Adversarial Networks; Autoencoders;
D O I
10.1109/BHI56158.2022.9926882
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cardiovascular disease (CVD) is a class of diseases that involve the heart or blood vessels and according to World Health Organization is the leading cause of death worldwide. EHR data regarding this case, as well as medical cases in general, contain missing values very frequently. The percentage of missingness may vary and is linked with instrument errors, manual data entry procedures, etc. Even though the missing rate is usually significant, in many cases the missing value imputation part is handled poorly either with case-deletion or with simple statistical approaches such as mode and median imputation. These methods are known to introduce significant bias, since they do not account for the relationships between the dataset's variables. Within the medical framework, many datasets consist of lab tests or patient medical tests, where these relationships are present and strong. To address these limitations, in this paper we test and modify state-of-the-art missing value imputation methods based on Generative Adversarial Networks (GANs) and Autoencoders. The evaluation is accomplished for both the tasks of data imputation and post-imputation prediction. Regarding the imputation task, we achieve improvements of 0.20, 7.00% in normalised Root Mean Squared Error (RMSE) and Area Under the Receiver Operating Characteristic Curve (AUROC) respectively. In terms of the post-imputation prediction task, our models outperform the standard approaches by 2.50% in F1-score.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Comparison of missing data imputation methods using weather data
    Nida, Hafiza
    Kashif, Muhammad
    Khan, Muhammad Imran
    Ghamkhar, Madiha
    [J]. PAKISTAN JOURNAL OF AGRICULTURAL SCIENCES, 2023, 60 (02): : 327 - 336
  • [2] Evaluating Imputation Methods for Missing Data in a MCI Dataset
    Gomez-Valades Batanero, Alba
    Rincon Zamorano, Mariano
    Martinez Tomas, Rafael
    Guerrero Martin, Juan
    [J]. ARTIFICIAL INTELLIGENCE IN NEUROSCIENCE: AFFECTIVE ANALYSIS AND HEALTH APPLICATIONS, PT I, 2022, 13258 : 446 - 454
  • [3] Imputation of missing longitudinal data: a comparison of methods
    Engels, JM
    Diehr, P
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (10) : 968 - 976
  • [4] Missing traffic data: comparison of imputation methods
    Li, Yuebiao
    Li, Zhiheng
    Li, Li
    [J]. IET INTELLIGENT TRANSPORT SYSTEMS, 2014, 8 (01) : 51 - 57
  • [5] Comparison of Performance of Data Imputation Methods for Numeric Dataset
    Jadhav, Anil
    Pramod, Dhanya
    Ramanathan, Krishnan
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2019, 33 (10) : 913 - 933
  • [6] A comparison of imputation methods for the consecutive missing temperature data
    Kim, Hee-Kyung
    Kang, In-Kyeong
    Lee, Jae-Won
    Lee, Yung-Seop
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (03) : 549 - 557
  • [7] Application and Comparison of Imputation Methods for Missing Degradation Data
    Fan, Ye
    Sun, Fuqiang
    Jiang, Tongmin
    [J]. ENGINEERING ASSET MANAGEMENT - SYSTEMS, PROFESSIONAL PRACTICES AND CERTIFICATION, 2015, : 1607 - 1614
  • [8] Comparison of imputation methods for missing laboratory data in medicine
    Waljee, Akbar K.
    Mukherjee, Ashin
    Singal, Amit G.
    Zhang, Yiwei
    Warren, Jeffrey
    Balis, Ulysses
    Marrero, Jorge
    Zhu, Ji
    Higgins, Peter D. R.
    [J]. BMJ OPEN, 2013, 3 (08):
  • [9] Missing Network Data A Comparison of Different Imputation Methods
    Krause, Robert W.
    Huisman, Mark
    Steglich, Christian
    Snijders, Tom A. B.
    [J]. 2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 159 - 163
  • [10] Performance Analysis of Various Missing Value Imputation Methods on Heart Failure Dataset
    Al Khaldy, Mohammad
    Kambhampati, Chandrasekhar
    [J]. PROCEEDINGS OF SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS) 2016, VOL 2, 2018, 16 : 415 - 425