Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models

被引:4
|
作者
Shyr, Cathy [1 ]
Hu, Yan [2 ]
Bastarache, Lisa [1 ]
Cheng, Alex [1 ]
Hamid, Rizwan [3 ]
Harris, Paul [1 ,4 ,5 ]
Xu, Hua [6 ]
机构
[1] Vanderbilt Univ, Med Ctr, Dept Biomed Informat, Nashville, TN 37203 USA
[2] Univ Texas Hlth Sci Ctr Houston, Sch Biomed Informat, Houston, TX 77225 USA
[3] Vanderbilt Univ, Med Ctr, Div Med Genet & Genom Med, Nashville, TN 37203 USA
[4] Vanderbilt Univ, Med Ctr, Dept Biostat, Nashville, TN 37203 USA
[5] Vanderbilt Univ, Med Ctr, Dept Biomed Engn, 2525 West End Ave, Nashville, TN 37203 USA
[6] Yale Sch Med, Sect Biomed Informat & Data Sci, 100 Coll St, New Haven, CT 06510 USA
基金
美国国家卫生研究院;
关键词
Natural language processing; ChatGPT; Rare disease; Artificial intelligence; Prompt learning; Large language model; HEALTH;
D O I
10.1007/s41666-023-00155-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
PurposePhenotyping is critical for informing rare disease diagnosis and treatment, but disease phenotypes are often embedded in unstructured text. While natural language processing (NLP) can automate extraction, a major bottleneck is developing annotated corpora. Recently, prompt learning with large language models (LLMs) has been shown to lead to generalizable results without any (zero-shot) or few annotated samples (few-shot), but none have explored this for rare diseases. Our work is the first to study prompt learning for identifying and extracting rare disease phenotypes in the zero- and few-shot settings.MethodsWe compared the performance of prompt learning with ChatGPT and fine-tuning with BioClinicalBERT. We engineered novel prompts for ChatGPT to identify and extract rare diseases and their phenotypes (e.g., diseases, symptoms, and signs), established a benchmark for evaluating its performance, and conducted an in-depth error analysis.ResultsOverall, fine-tuning BioClinicalBERT resulted in higher performance (F1 of 0.689) than ChatGPT (F1 of 0.472 and 0.610 in the zero- and few-shot settings, respectively). However, ChatGPT achieved higher accuracy for rare diseases and signs in the one-shot setting (F1 of 0.778 and 0.725). Conversational, sentence-based prompts generally achieved higher accuracy than structured lists.ConclusionPrompt learning using ChatGPT has the potential to match or outperform fine-tuning BioClinicalBERT at extracting rare diseases and signs with just one annotated sample. Given its accessibility, ChatGPT could be leveraged to extract these entities without relying on a large, annotated corpus. While LLMs can support rare disease phenotyping, researchers should critically evaluate model outputs to ensure phenotyping accuracy.
引用
收藏
页码:438 / 461
页数:24
相关论文
共 50 条
  • [1] Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models
    Cathy Shyr
    Yan Hu
    Lisa Bastarache
    Alex Cheng
    Rizwan Hamid
    Paul Harris
    Hua Xu
    [J]. Journal of Healthcare Informatics Research, 2024, 8 : 438 - 461
  • [2] Extracting Training Data from Large Language Models
    Carlini, Nicholas
    Tramer, Florian
    Wallace, Eric
    Jagielski, Matthew
    Herbert-Voss, Ariel
    Lee, Katherine
    Roberts, Adam
    Brown, Tom
    Song, Dawn
    Erlingsson, Ulfar
    Oprea, Alina
    Raffel, Colin
    [J]. PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, 2021, : 2633 - 2650
  • [3] PROSPER: Extracting Protocol Specifications Using Large Language Models
    Sharma, Prakhar
    Yegneswaran, Vinod
    [J]. PROCEEDINGS OF THE 22ND ACM WORKSHOP ON HOT TOPICS IN NETWORKS, HOTNETS 2023, 2023, : 41 - 47
  • [4] Extracting Domain Models from Textual Requirements in the Era of Large Language Models
    Arulmohan, Sathurshan
    Meurs, Marie-Jean
    Mosser, Sebastien
    [J]. 2023 ACM/IEEE INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS COMPANION, MODELS-C, 2023, : 580 - 587
  • [5] Identifying Rare Events in Rare Diseases
    Attiyeh, Edward F.
    Maris, John M.
    [J]. CLINICAL CANCER RESEARCH, 2015, 21 (08) : 1782 - 1785
  • [6] Evaluating the Efficacy of Large Language Models in Identifying Phishing Attempts
    Patel, Het
    Reiman, Umair
    Iqbal, Farkhund
    [J]. 2024 16TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION, HSI 2024, 2024,
  • [7] Identifying symptom etiologies using syntactic patterns and large language models
    Taub-Tabib, Hillel
    Shamay, Yosi
    Shlain, Micah
    Pinhasov, Menny
    Polak, Mark
    Tiktinsky, Aryeh
    Rahamimov, Sigal
    Bareket, Dan
    Eyal, Ben
    Kassis, Moriya
    Goldberg, Yoav
    Rosenberg, Tal Kaminski
    Vulfsons, Simon
    Ben Sasson, Maayan
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [8] Pain Phenotypes in Rare Musculoskeletal and Neuromuscular Diseases
    Tucker-Bartley, Anthony
    Lemme, Jordan
    Gomez-Morad, Andrea
    Shah, Nehal
    Veliu, Miranda
    Birklein, Frank
    Storz, Claudia
    Rutkove, Seward
    Kronn, David
    Boyce, Alison M.
    Kraft, Eduard
    Upadhyay, Jaymin
    [J]. NEUROSCIENCE AND BIOBEHAVIORAL REVIEWS, 2021, 124 : 267 - 290
  • [9] Letter: The use of large language models as medical chatbots in digestive diseases
    Daungsupawong, Hinpetch
    Wiwanitkit, Viroj
    [J]. ALIMENTARY PHARMACOLOGY & THERAPEUTICS, 2024,
  • [10] Extracting Legal Norm Analysis Categories from German Law Texts with Large Language Models
    Bachinger, Sarah T.
    Feddoul, Leila
    Mauch, Marianne
    Koenig-Ries, Birgitta
    [J]. PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH, DGO 2024, 2024, : 481 - 493