Adapting transformer-based language models for heart disease detection and risk factors extraction

被引:0
|
作者
Houssein, Essam H. [1 ]
Mohamed, Rehab E. [1 ]
Hu, Gang [2 ]
Ali, Abdelmgeid A. [1 ]
机构
[1] Minia Univ, Fac Comp & Informat, Al Minya, Egypt
[2] Xian Univ Technol, Dept Appl Math, Xian 710054, Peoples R China
关键词
Coronary artery disease; Electronic health records; Natural language processing; Bidirectional encoder representations from transformers; Heart disease; Transformer-based models; MEDICATION INFORMATION; IDENTIFICATION; TEXT; RECOGNITION; COHORT;
D O I
10.1186/s40537-024-00903-y
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Efficiently treating cardiac patients before the onset of a heart attack relies on the precise prediction of heart disease. Identifying and detecting the risk factors for heart disease such as diabetes mellitus, Coronary Artery Disease (CAD), hyperlipidemia, hypertension, smoking, familial CAD history, obesity, and medications is critical for developing effective preventative and management measures. Although Electronic Health Records (EHRs) have emerged as valuable resources for identifying these risk factors, their unstructured format poses challenges for cardiologists in retrieving relevant information. This research proposed employing transfer learning techniques to automatically extract heart disease risk factors from EHRs. Leveraging transfer learning, a deep learning technique has demonstrated a significant performance in various clinical natural language processing (NLP) applications, particularly in heart disease risk prediction. This study explored the application of transformer-based language models, specifically utilizing pre-trained architectures like BERT (Bidirectional Encoder Representations from Transformers), RoBERTa, BioClinicalBERT, XLNet, and BioBERT for heart disease detection and extraction of related risk factors from clinical notes, using the i2b2 dataset. These transformer models are pre-trained on an extensive corpus of medical literature and clinical records to gain a deep understanding of contextualized language representations. Adapted models are then fine-tuned using annotated datasets specific to heart disease, such as the i2b2 dataset, enabling them to learn patterns and relationships within the domain. These models have demonstrated superior performance in extracting semantic information from EHRs, automating high-performance heart disease risk factor identification, and performing downstream NLP tasks within the clinical domain. This study proposed fine-tuned five widely used transformer-based models, namely BERT, RoBERTa, BioClinicalBERT, XLNet, and BioBERT, using the 2014 i2b2 clinical NLP challenge dataset. The fine-tuned models surpass conventional approaches in predicting the presence of heart disease risk factors with impressive accuracy. The RoBERTa model has achieved the highest performance, with micro F1-scores of 94.27%, while the BERT, BioClinicalBERT, XLNet, and BioBERT models have provided competitive performances with micro F1-scores of 93.73%, 94.03%, 93.97%, and 93.99%, respectively. Finally, a simple ensemble of the five transformer-based models has been proposed, which outperformed the most existing methods in heart disease risk fan, achieving a micro F1-Score of 94.26%. This study demonstrated the efficacy of transfer learning using transformer-based models in enhancing risk prediction and facilitating early intervention for heart disease prevention.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Adapting transformer-based language models for heart disease detection and risk factors extraction
    Essam H. Houssein
    Rehab E. Mohamed
    Gang Hu
    Abdelmgeid A. Ali
    [J]. Journal of Big Data, 11
  • [2] RadBERT: Adapting Transformer-based Language Models to Radiology
    Yan, An
    McAuley, Julian
    Lu, Xing
    Du, Jiang
    Chang, Eric Y.
    Gentili, Amilcare
    Hsu, Chun-Nan
    [J]. RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2022, 4 (04)
  • [3] Transformer-Based Language Models for Software Vulnerability Detection
    Thapa, Chandra
    Jang, Seung Ick
    Ahmed, Muhammad Ejaz
    Camtepe, Seyit
    Pieprzyk, Josef
    Nepal, Surya
    [J]. PROCEEDINGS OF THE 38TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2022, 2022, : 481 - 496
  • [4] Transformer-based Extraction of Deep Image Models
    Battis, Verena
    Penner, Alexander
    [J]. 2022 IEEE 7TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2022), 2022, : 320 - 336
  • [5] Transformer-based models for multimodal irony detection
    Tomás D.
    Ortega-Bueno R.
    Zhang G.
    Rosso P.
    Schifanella R.
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (6) : 7399 - 7410
  • [6] Adaptation of Transformer-Based Models for Depression Detection
    Adebanji, Olaronke O.
    Ojo, Olumide E.
    Calvo, Hiram
    Gelbukh, Irina
    Sidorov, Grigori
    [J]. COMPUTACION Y SISTEMAS, 2024, 28 (01): : 151 - 165
  • [7] Ouroboros: On Accelerating Training of Transformer-Based Language Models
    Yang, Qian
    Huo, Zhouyuan
    Wang, Wenlin
    Huang, Heng
    Carin, Lawrence
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [8] A Comparison of Transformer-Based Language Models on NLP Benchmarks
    Greco, Candida Maria
    Tagarelli, Andrea
    Zumpano, Ester
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 490 - 501
  • [9] TAG: Gradient Attack on Transformer-based Language Models
    Deng, Jieren
    Wang, Yijue
    Li, Ji
    Wang, Chenghong
    Shang, Chao
    Liu, Hang
    Rajasekaran, Sanguthevar
    Ding, Caiwen
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3600 - 3610
  • [10] Applications of transformer-based language models in bioinformatics: a survey
    Zhang, Shuang
    Fan, Rui
    Liu, Yuti
    Chen, Shuang
    Liu, Qiao
    Zeng, Wanwen
    [J]. NEURO-ONCOLOGY ADVANCES, 2023, 5 (01)