Interpretable Data-Driven Approach Based on Feature Selection Methods and GAN-Based Models for Cardiovascular Risk Prediction in Diabetic Patients

被引：2

作者：

Chushig-Muzo, David ^{[1
]}

Calero-Diaz, Hugo ^{[1
]}

Lara-Abelenda, Francisco J. ^{[1
]}

Gomez-Martinez, Vanesa ^{[1
]}

Granja, Conceicao ^{[2
]}

Soguero-Ruiz, Cristina ^{[1
]}

机构：

[1] Rey Juan Carlos Univ, Dept Signal Theory & Commun Telemat & Comp, Fuenlabrada 28943, Madrid, Spain

[2] Univ Hosp North Norway, Norwegian Ctr Ehlth Res, N-9038 Tromso, Norway

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Diabetes; Predictive models; Training; Radio frequency; Data models; Biological system modeling; Feature extraction; Cardiovascular system; Generative adversarial networks; Cardiovascular risk prediction; type; 1; diabetes; machine learning; interpretable methods; feature selection; generative adversarial networks; accumulated local effects; post-hoc interpretability; CTGAN; DISEASE; EVENTS; NETWORKS; 1ST;

D O I：

10.1109/ACCESS.2024.3412789

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Noncommunicable diseases (NCDs) are the leading cause of morbidity and mortality worldwide. Cardiovascular diseases (CVDs) and diabetes are the most prevalent NCDs, causing 1.9 and 1.5 million deaths yearly. Individuals diagnosed with type 1 diabetes (T1D) are at high risk of developing CVDs. Machine learning (ML) models have provided outstanding results in different domains, including healthcare, allowing to obtain models with high predictive performance. The aim of this study was to develop an interpretable data-driven approach to predict the 10-year CVD risk for T1D older individuals, aiming to provide both reasonable predictive performance and the identification of risk factors associated with CVDs. Data from T1D individuals at the Steno Diabetes Center Copenhagen were used. Different ML-based models were considered, including KNN, decision tree, random forest, and multilayer perceptron (MLP). To enhance the predictive performance of ML models, the conditional tabular generative adversarial network (CTGAN) was used to create synthetic data and increase the size of the training data. Several filter and wrapper feature selection (FS) techniques were considered for identifying the most relevant features involved in CVD risk and enhancing the performance of the ML-based models used. To gain interpretability on predictive models, we used the post-hoc methods: SHAP and accumulated local effects. The experimental results showed a great performance of FS and ML-based models for predicting CVD risk. In particular, the MLP obtained the best results, with a mean absolute error of 0.0088 and mean relative absolute error of 0.0817. Regarding risk factors, age, Hba1c, and albuminuria were identified as crucial in CVD risk prediction, which is in line with recent clinical evidence. Our study contributes to identifying CVD risk and associated risk factors in a data-driven manner, helping to make early interventions and adequate treatments to prevent CVDs.

引用

页码：84292 / 84305

页数：14

共 50 条

[1] Data-driven cardiovascular risk prediction and prognosis factor identification in diabetic patients
Calero-Diaz, Hugo
Chushig-Muzo, David
Fabelo, Himar
Mora-Jimenez, Inmaculada
Granja, Conceicao
Soguero-Ruiz, Cristina
2022 IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS (BHI) JOINTLY ORGANISED WITH THE IEEE-EMBS INTERNATIONAL CONFERENCE ON WEARABLE AND IMPLANTABLE BODY SENSOR NETWORKS (BSN'22), 2022,
[2] An interpretable data-driven approach for rules construction: application to cardiovascular risk assessment
Mendes, D.
Paredes, S.
Rocha, T.
Carvalho, P.
Henriques, J.
Morais, J.
2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2017, : 2646 - 2649
[3] A causality based feature selection approach for data-driven dynamic security assessment
Bellizio, Federica
Cremer, Jochen L.
Sun, Mingyang
Strbac, Goran
ELECTRIC POWER SYSTEMS RESEARCH, 2021, 201 (201)
[4] Assessment of Cardiovascular Risk based on a Data-driven Knowledge Discovery Approach
Mendes, D.
Paredes, S.
Rocha, T.
Carvalho, P.
Henriques, J.
Cabiddu, R.
Morais, J.
2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 6800 - 6803
[5] A GAN-Based Data Injection Attack Method on Data-Driven Strategies in Power Systems
Liu, Zengji
Wang, Qi
Ye, Yujian
Tang, Yi
IEEE TRANSACTIONS ON SMART GRID, 2022, 13 (04) : 3203 - 3213
[6] A Data-Driven Approach for Building a Cardiovascular Disease Risk Prediction System
Wang, Hongkuan
Wong, Raymond K.
Ong, Kwok Leung
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT IV, PAKDD 2024, 2024, 14648 : 271 - 283
[7] Prediction-Based Power Consumption Monitoring of Industrial Equipment Using Interpretable Data-Driven Models
Xiao, Hui
Hu, Wenshan
Zhou, Hong
Liu, Guo-Ping
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (02) : 1312 - 1322
[8] Comparison of an interpretable data-driven approach with state of the art classifiers: application to cardiovascular risk assessment
Mendes, Diana
de Carvalho, Paulo
Henriques, Jorge
Paredes, Simao
Rocha, Teresa
Morais, Joao
2017 IEEE 3RD INTERNATIONAL FORUM ON RESEARCH AND TECHNOLOGIES FOR SOCIETY AND INDUSTRY (RTSI), 2017, : 440 - 444
[9] Efficient Data-Driven Machine Learning Models for Cardiovascular Diseases Risk Prediction
Dritsas, Elias
Trigka, Maria
SENSORS, 2023, 23 (03)
[10] Ontology-based feature transformations:: A data-driven approach
Ginter, F
Pyysalo, S
Boberg, J
Järvinen, J
Salakoski, T
ADVANCES IN NATURAL LANGUAGE PROCESSING, 2004, 3230 : 279 - 290

← 1 2 3 4 5 →