Interpretable Data-Driven Approach Based on Feature Selection Methods and GAN-Based Models for Cardiovascular Risk Prediction in Diabetic Patients

被引:2
|
作者
Chushig-Muzo, David [1 ]
Calero-Diaz, Hugo [1 ]
Lara-Abelenda, Francisco J. [1 ]
Gomez-Martinez, Vanesa [1 ]
Granja, Conceicao [2 ]
Soguero-Ruiz, Cristina [1 ]
机构
[1] Rey Juan Carlos Univ, Dept Signal Theory & Commun Telemat & Comp, Fuenlabrada 28943, Madrid, Spain
[2] Univ Hosp North Norway, Norwegian Ctr Ehlth Res, N-9038 Tromso, Norway
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Diabetes; Predictive models; Training; Radio frequency; Data models; Biological system modeling; Feature extraction; Cardiovascular system; Generative adversarial networks; Cardiovascular risk prediction; type; 1; diabetes; machine learning; interpretable methods; feature selection; generative adversarial networks; accumulated local effects; post-hoc interpretability; CTGAN; DISEASE; EVENTS; NETWORKS; 1ST;
D O I
10.1109/ACCESS.2024.3412789
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Noncommunicable diseases (NCDs) are the leading cause of morbidity and mortality worldwide. Cardiovascular diseases (CVDs) and diabetes are the most prevalent NCDs, causing 1.9 and 1.5 million deaths yearly. Individuals diagnosed with type 1 diabetes (T1D) are at high risk of developing CVDs. Machine learning (ML) models have provided outstanding results in different domains, including healthcare, allowing to obtain models with high predictive performance. The aim of this study was to develop an interpretable data-driven approach to predict the 10-year CVD risk for T1D older individuals, aiming to provide both reasonable predictive performance and the identification of risk factors associated with CVDs. Data from T1D individuals at the Steno Diabetes Center Copenhagen were used. Different ML-based models were considered, including KNN, decision tree, random forest, and multilayer perceptron (MLP). To enhance the predictive performance of ML models, the conditional tabular generative adversarial network (CTGAN) was used to create synthetic data and increase the size of the training data. Several filter and wrapper feature selection (FS) techniques were considered for identifying the most relevant features involved in CVD risk and enhancing the performance of the ML-based models used. To gain interpretability on predictive models, we used the post-hoc methods: SHAP and accumulated local effects. The experimental results showed a great performance of FS and ML-based models for predicting CVD risk. In particular, the MLP obtained the best results, with a mean absolute error of 0.0088 and mean relative absolute error of 0.0817. Regarding risk factors, age, Hba1c, and albuminuria were identified as crucial in CVD risk prediction, which is in line with recent clinical evidence. Our study contributes to identifying CVD risk and associated risk factors in a data-driven manner, helping to make early interventions and adequate treatments to prevent CVDs.
引用
收藏
页码:84292 / 84305
页数:14
相关论文
共 50 条
  • [1] Data-driven cardiovascular risk prediction and prognosis factor identification in diabetic patients
    Calero-Diaz, Hugo
    Chushig-Muzo, David
    Fabelo, Himar
    Mora-Jimenez, Inmaculada
    Granja, Conceicao
    Soguero-Ruiz, Cristina
    2022 IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS (BHI) JOINTLY ORGANISED WITH THE IEEE-EMBS INTERNATIONAL CONFERENCE ON WEARABLE AND IMPLANTABLE BODY SENSOR NETWORKS (BSN'22), 2022,
  • [2] An interpretable data-driven approach for rules construction: application to cardiovascular risk assessment
    Mendes, D.
    Paredes, S.
    Rocha, T.
    Carvalho, P.
    Henriques, J.
    Morais, J.
    2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2017, : 2646 - 2649
  • [3] A causality based feature selection approach for data-driven dynamic security assessment
    Bellizio, Federica
    Cremer, Jochen L.
    Sun, Mingyang
    Strbac, Goran
    ELECTRIC POWER SYSTEMS RESEARCH, 2021, 201 (201)
  • [4] Assessment of Cardiovascular Risk based on a Data-driven Knowledge Discovery Approach
    Mendes, D.
    Paredes, S.
    Rocha, T.
    Carvalho, P.
    Henriques, J.
    Cabiddu, R.
    Morais, J.
    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 6800 - 6803
  • [5] A GAN-Based Data Injection Attack Method on Data-Driven Strategies in Power Systems
    Liu, Zengji
    Wang, Qi
    Ye, Yujian
    Tang, Yi
    IEEE TRANSACTIONS ON SMART GRID, 2022, 13 (04) : 3203 - 3213
  • [6] A Data-Driven Approach for Building a Cardiovascular Disease Risk Prediction System
    Wang, Hongkuan
    Wong, Raymond K.
    Ong, Kwok Leung
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT IV, PAKDD 2024, 2024, 14648 : 271 - 283
  • [7] Prediction-Based Power Consumption Monitoring of Industrial Equipment Using Interpretable Data-Driven Models
    Xiao, Hui
    Hu, Wenshan
    Zhou, Hong
    Liu, Guo-Ping
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (02) : 1312 - 1322
  • [8] Comparison of an interpretable data-driven approach with state of the art classifiers: application to cardiovascular risk assessment
    Mendes, Diana
    de Carvalho, Paulo
    Henriques, Jorge
    Paredes, Simao
    Rocha, Teresa
    Morais, Joao
    2017 IEEE 3RD INTERNATIONAL FORUM ON RESEARCH AND TECHNOLOGIES FOR SOCIETY AND INDUSTRY (RTSI), 2017, : 440 - 444
  • [9] Efficient Data-Driven Machine Learning Models for Cardiovascular Diseases Risk Prediction
    Dritsas, Elias
    Trigka, Maria
    SENSORS, 2023, 23 (03)
  • [10] Ontology-based feature transformations:: A data-driven approach
    Ginter, F
    Pyysalo, S
    Boberg, J
    Järvinen, J
    Salakoski, T
    ADVANCES IN NATURAL LANGUAGE PROCESSING, 2004, 3230 : 279 - 290