Leveraging large language models for medical text classification: a hospital readmission prediction case

被引：0

作者：

Nazyrova, Nodira ^{[1
]}

Chahed, Salma ^{[1
]}

Chausalet, Thierry ^{[1
]}

Dwek, Miriam ^{[2
]}

机构：

[1] Univ Westminster, Sch Comp Sci & Engn, London, England

[2] Univ Westminster, Sch Life Sci, London, England

来源：

2024 14TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS, ICPRS | 2024年

关键词：

hospital readmission prediction; domain-specific transformer models; BERT; ClinicalBERT; SciBERT; BioBERT; large language models;

D O I：

10.1109/ICPRS62101.2024.10677826

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the intersection of natural language processing (NLP) and healthcare informatics has witnessed a revolutionary transformation. One of the most groundbreaking developments in this realm is the advent of large language models (LLM), which have demonstrated remarkable capabilities in analysing clinical data. This paper aims to explore the potential of large language models in medical text classification, shedding light on their ability to discern subtle patterns, grasp domain-specific terminology, and adapt to the dynamic nature of medical information. This research focuses on the application of transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), on hospital discharge summaries to predict 30-day readmissions among older adults. In particular, we explore the role of transfer learning in medical text classification and compare domain-specific transformer models, such as SciBERT, BioBERT and ClinicalBERT. We also analyse how data preprocessing techniques affect the performance of language models. Our comparative analysis shows that removing parts of text with a large proportion of out-of-vocabulary words improves the classification results. We also investigate how the input sequence length affects the model performance, varying sequence length from 128 to 512 for BERT-based models and 4096 sequence length for the Longformers. The results of the investigation showed that among compared models SciBERT yields the best performance when applied in the medical domain, improving current hospital readmission predictions using clinical notes on MIMIC data from 0.714 to 0.735 AUROC. Our next step is pretraining a model with a large corpus of clinical notes to potentially improve the adaptability of a language model in the medical domain and achieve better results in downstream tasks.

引用

页数：7

共 50 条

[41] MicroRec: Leveraging Large Language Models for Microservice Recommendation
Alsayed, Ahmed Saeed
Dam, Hoa Khanh
Nguyen, Chau
2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 419 - 430
[42] Leveraging Large Language Models for Navigating Brand Territory
Luisa Fernanda Rodriguez-Sarmiento
Vladimir Sanchez-Riaño
Ixent Galpin
SN Computer Science, 5 (8)
[43] Leveraging large language models for word sense disambiguation
Jung H. Yae
Nolan C. Skelly
Neil C. Ranly
Phillip M. LaCasse
Neural Computing and Applications, 2025, 37 (6) : 4093 - 4110
[44] Leveraging Large Language Models for VNF Resource Forecasting
Su, Jing
Nair, Suku
Popokh, Leo
2024 IEEE 10TH INTERNATIONAL CONFERENCE ON NETWORK SOFTWARIZATION, NETSOFT 2024, 2024, : 258 - 262
[45] Leveraging Large Language Models for Effective Organizational Navigation
Chandrasekar, Haresh
Gupta, Srishti
Liu, Chun-Tzu
Tsai, Chun-Hua
PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH, DGO 2024, 2024, : 1020 - 1022
[46] Leveraging large language models to foster equity in healthcare
Rodriguez, Jorge A.
Alsentzer, Emily
Bates, David W.
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09)
[47] Leveraging Large Language Models for Clinical Abbreviation Disambiguation
Hosseini, Manda
Hosseini, Mandana
Javidan, Reza
JOURNAL OF MEDICAL SYSTEMS, 2024, 48 (01)
[48] Leveraging large language models for peptide antibiotic design
Guan, Changge
Fernandes, Fabiano C.
Franco, Octavio L.
de la Fuente-nunez, Cesar
CELL REPORTS PHYSICAL SCIENCE, 2025, 6 (01):
[49] Leveraging large language models for academic conference organization
Luo, Yuan
Li, Yikuan
Ogunyemi, Omolola
Koski, Eileen
Himes, Blanca E.
NPJ DIGITAL MEDICINE, 2025, 8 (01):
[50] On Leveraging Large Language Models for Multilingual Intent Discovery
Chow, Rudolf
Suen, King yiu
Lam, Albert Y. S.
ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2025, 16 (01)

← 1 2 3 4 5 →