Extracting COVID-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework

被引:29
|
作者
Lybarger, Kevin [1 ]
Ostendorf, Mari [2 ]
Thompson, Matthew [3 ]
Yetisgen, Meliha [1 ]
机构
[1] Univ Washington, Biomed & Hlth Informat, Box 358047, Seattle, WA 98109 USA
[2] Univ Washington, Dept Elect & Comp Engn, Campus Box 352500 185, Seattle, WA 98195 USA
[3] Univ Washington, Dept Family Med, Box 354696, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
COVID-19; Coronavirus; Machine learning; Natural language processing; Information extraction; METAMAP;
D O I
10.1016/j.jbi.2021.103761
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Coronavirus disease 2019 (COVID-19) is a global pandemic. Although much has been learned about the novel coronavirus since its emergence, there are many open questions related to tracking its spread, describing symptomology, predicting the severity of infection, and forecasting healthcare utilization. Free-text clinical notes contain critical information for resolving these questions. Data-driven, automatic information extraction models are needed to use this text-encoded information in large-scale studies. This work presents a new clinical corpus, referred to as the COVID-19 Annotated Clinical Text (CACT) Corpus, which comprises 1,472 notes with detailed annotations characterizing COVID-19 diagnoses, testing, and clinical presentation. We introduce a span-based event extraction model that jointly extracts all annotated phenomena, achieving high performance in identifying COVID-19 and symptom events with associated assertion values (0.83-0.97 F1 for events and 0.73-0.79 F1 for assertions). Our span-based event extraction model outperforms an extractor built on MetaMapLite for the identification of symptoms with assertion values. In a secondary use application, we predicted COVID-19 test results using structured patient data (e.g. vital signs and laboratory results) and automatically extracted symptom information, to explore the clinical presentation of COVID-19. Automatically extracted symptoms improve COVID-19 prediction performance, beyond structured data alone.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] A Deep Language Model for Symptom Extraction From Clinical Text and its Application to Extract COVID-19 Symptoms From Social Media
    Luo, Xiao
    Gandhi, Priyanka
    Storey, Susan
    Huang, Kun
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (04) : 1737 - 1748
  • [2] Extracting symptoms from free-text responses using ChatGPT among COVID-19 cases in Hong Kong
    Wei, Wan In
    Leung, Cyrus Lap Kwan
    Tang, Arthur
    McNeil, Edward Braddon
    Wong, Samuel Yeung Shan
    Kwok, Kin On
    CLINICAL MICROBIOLOGY AND INFECTION, 2024, 30 (01) : 142e1 - 142e3
  • [3] Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials
    Sun, Yingcheng
    Butler, Alex
    Stewart, Latoya A.
    Liu, Hao
    Yuan, Chi
    Southard, Christopher T.
    Kim, Jae Hyun
    Weng, Chunhua
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 118
  • [4] ExcavatorCovid: Extracting Events and Relations from Text Corpora for Temporal and Causal Analysis for COVID-19
    Min, Bonan
    Rozonoyer, Ben
    Qiu, Haoling
    Zamanian, Alex
    Xue, Nianwen
    MacBride, Jessica
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2021, : 63 - 71
  • [5] COVID-19 EXACERBATES EXISTING DIGESTIVE DISORDERS AND TRIGGERS NEW DIGESTIVE SYMPTOMS AND DIAGNOSES: RESULTS FROM AN INTERNATIONAL PATIENT SURVEY
    Barrett-Englert, Meghan
    Rao, Aditi
    Chey, Samuel
    Sutton, Alyssa
    Taylor, Emily
    Chey, William D.
    GASTROENTEROLOGY, 2022, 162 (07) : S288 - S289
  • [6] Coping Strategies, Neural Structure, and Depression and Anxiety During the COVID-19 Pandemic: A Longitudinal Study in a Naturalistic Sample Spanning Clinical Diagnoses and Subclinical Symptoms
    Holt-Gosselin, Bailey
    Tozzi, Leonardo
    Ramirez, Carolina A.
    Gotlib, Ian H.
    Williams, Leanne M.
    BIOLOGICAL PSYCHIATRY: GLOBAL OPEN SCIENCE, 2021, 1 (04): : 261 - 271
  • [7] Classifying aggravation status of COVID-19 event from short-text using CNN
    Nugraheni, Ekasari
    Khotimah, Purnomo Husnul
    Arisal, Andria
    Rozie, Andri Fachrur
    Riswantini, Dianadewi
    Purwarianti, Ayu
    2020 INTERNATIONAL CONFERENCE ON RADAR, ANTENNA, MICROWAVE, ELECTRONICS, AND TELECOMMUNICATIONS (ICRAMET): FOSTERING INNOVATION THROUGH ICTS FOR SUSTAINABLE SMART SOCIETY, 2020, : 240 - 245
  • [8] Cluster-based text mining for extracting drug candidates for the prevention of COVID-19 from the biomedical literature
    Supianto, Ahmad Afif
    Nurdiansyah, Rizky
    Weng, Chia-Wei
    Zilvan, Vicky
    Yuwana, Raden Sandra
    Arisal, Andria
    Pardede, Hilman Ferdinandus
    Lee, Min-Min
    Huang, Chien-Hung
    Ng, Ka-Lok
    JOURNAL OF TAIBAH UNIVERSITY MEDICAL SCIENCES, 2023, 18 (04): : 787 - 801
  • [9] Prevalence of psychiatric symptoms and diagnoses during the COVID-19 pandemic in Sao Paulo: Findings from the ELSA-Brasil COVID-19 mental health cohort
    Brunoni, Andre
    Lotufo, Paulo
    Suen, Paulo
    Bacchi, Pedro
    Klein, Izio
    Razza, Lais
    dos Santos, Leonardo
    Bensenor, Isabela
    BIPOLAR DISORDERS, 2021, 23 : 25 - 25
  • [10] Persistent Symptoms Three Months after Clinical Recovery from COVID-19
    Jarjoui, A.
    Kalak, G.
    Izbicki, G.
    Rokach, A.
    Bohadana, A.
    Wild, P.
    Abdelrahman, N.
    Arish, N.
    Chen-Shuai, C.
    Amiad, N.
    EUROPEAN RESPIRATORY JOURNAL, 2022, 60