Adapting transformer-based language models for heart disease detection and risk factors extraction

被引：0

作者：

Houssein, Essam H. ^{[1
]}

Mohamed, Rehab E. ^{[1
]}

Hu, Gang ^{[2
]}

Ali, Abdelmgeid A. ^{[1
]}

机构：

[1] Minia Univ, Fac Comp & Informat, Al Minya, Egypt

[2] Xian Univ Technol, Dept Appl Math, Xian 710054, Peoples R China

来源：

JOURNAL OF BIG DATA | 2024年 / 11卷 / 01期

关键词：

Coronary artery disease; Electronic health records; Natural language processing; Bidirectional encoder representations from transformers; Heart disease; Transformer-based models; MEDICATION INFORMATION; IDENTIFICATION; TEXT; RECOGNITION; COHORT;

D O I：

10.1186/s40537-024-00903-y

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Efficiently treating cardiac patients before the onset of a heart attack relies on the precise prediction of heart disease. Identifying and detecting the risk factors for heart disease such as diabetes mellitus, Coronary Artery Disease (CAD), hyperlipidemia, hypertension, smoking, familial CAD history, obesity, and medications is critical for developing effective preventative and management measures. Although Electronic Health Records (EHRs) have emerged as valuable resources for identifying these risk factors, their unstructured format poses challenges for cardiologists in retrieving relevant information. This research proposed employing transfer learning techniques to automatically extract heart disease risk factors from EHRs. Leveraging transfer learning, a deep learning technique has demonstrated a significant performance in various clinical natural language processing (NLP) applications, particularly in heart disease risk prediction. This study explored the application of transformer-based language models, specifically utilizing pre-trained architectures like BERT (Bidirectional Encoder Representations from Transformers), RoBERTa, BioClinicalBERT, XLNet, and BioBERT for heart disease detection and extraction of related risk factors from clinical notes, using the i2b2 dataset. These transformer models are pre-trained on an extensive corpus of medical literature and clinical records to gain a deep understanding of contextualized language representations. Adapted models are then fine-tuned using annotated datasets specific to heart disease, such as the i2b2 dataset, enabling them to learn patterns and relationships within the domain. These models have demonstrated superior performance in extracting semantic information from EHRs, automating high-performance heart disease risk factor identification, and performing downstream NLP tasks within the clinical domain. This study proposed fine-tuned five widely used transformer-based models, namely BERT, RoBERTa, BioClinicalBERT, XLNet, and BioBERT, using the 2014 i2b2 clinical NLP challenge dataset. The fine-tuned models surpass conventional approaches in predicting the presence of heart disease risk factors with impressive accuracy. The RoBERTa model has achieved the highest performance, with micro F1-scores of 94.27%, while the BERT, BioClinicalBERT, XLNet, and BioBERT models have provided competitive performances with micro F1-scores of 93.73%, 94.03%, 93.97%, and 93.99%, respectively. Finally, a simple ensemble of the five transformer-based models has been proposed, which outperformed the most existing methods in heart disease risk fan, achieving a micro F1-Score of 94.26%. This study demonstrated the efficacy of transfer learning using transformer-based models in enhancing risk prediction and facilitating early intervention for heart disease prevention.

引用

页数：27

共 50 条

[1] Adapting transformer-based language models for heart disease detection and risk factors extraction
Essam H. Houssein
Rehab E. Mohamed
Gang Hu
Abdelmgeid A. Ali
[J]. Journal of Big Data, 11
[2] RadBERT: Adapting Transformer-based Language Models to Radiology
Yan, An
McAuley, Julian
Lu, Xing
Du, Jiang
Chang, Eric Y.
Gentili, Amilcare
Hsu, Chun-Nan
[J]. RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2022, 4 (04)
[3] Transformer-Based Language Models for Software Vulnerability Detection
Thapa, Chandra
Jang, Seung Ick
Ahmed, Muhammad Ejaz
Camtepe, Seyit
Pieprzyk, Josef
Nepal, Surya
[J]. PROCEEDINGS OF THE 38TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2022, 2022, : 481 - 496
[4] Transformer-based Extraction of Deep Image Models
Battis, Verena
Penner, Alexander
[J]. 2022 IEEE 7TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2022), 2022, : 320 - 336
[5] Transformer-based models for multimodal irony detection
Tomás D.
Ortega-Bueno R.
Zhang G.
Rosso P.
Schifanella R.
[J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (6) : 7399 - 7410
[6] Adaptation of Transformer-Based Models for Depression Detection
Adebanji, Olaronke O.
Ojo, Olumide E.
Calvo, Hiram
Gelbukh, Irina
Sidorov, Grigori
[J]. COMPUTACION Y SISTEMAS, 2024, 28 (01): : 151 - 165
[7] Ouroboros: On Accelerating Training of Transformer-Based Language Models
Yang, Qian
Huo, Zhouyuan
Wang, Wenlin
Huang, Heng
Carin, Lawrence
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[8] A Comparison of Transformer-Based Language Models on NLP Benchmarks
Greco, Candida Maria
Tagarelli, Andrea
Zumpano, Ester
[J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 490 - 501
[9] TAG: Gradient Attack on Transformer-based Language Models
Deng, Jieren
Wang, Yijue
Li, Ji
Wang, Chenghong
Shang, Chao
Liu, Hang
Rajasekaran, Sanguthevar
Ding, Caiwen
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3600 - 3610
[10] Applications of transformer-based language models in bioinformatics: a survey
Zhang, Shuang
Fan, Rui
Liu, Yuti
Chen, Shuang
Liu, Qiao
Zeng, Wanwen
[J]. NEURO-ONCOLOGY ADVANCES, 2023, 5 (01)

← 1 2 3 4 5 →