Extracting COVID-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework

被引:29
|
作者
Lybarger, Kevin [1 ]
Ostendorf, Mari [2 ]
Thompson, Matthew [3 ]
Yetisgen, Meliha [1 ]
机构
[1] Univ Washington, Biomed & Hlth Informat, Box 358047, Seattle, WA 98109 USA
[2] Univ Washington, Dept Elect & Comp Engn, Campus Box 352500 185, Seattle, WA 98195 USA
[3] Univ Washington, Dept Family Med, Box 354696, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
COVID-19; Coronavirus; Machine learning; Natural language processing; Information extraction; METAMAP;
D O I
10.1016/j.jbi.2021.103761
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Coronavirus disease 2019 (COVID-19) is a global pandemic. Although much has been learned about the novel coronavirus since its emergence, there are many open questions related to tracking its spread, describing symptomology, predicting the severity of infection, and forecasting healthcare utilization. Free-text clinical notes contain critical information for resolving these questions. Data-driven, automatic information extraction models are needed to use this text-encoded information in large-scale studies. This work presents a new clinical corpus, referred to as the COVID-19 Annotated Clinical Text (CACT) Corpus, which comprises 1,472 notes with detailed annotations characterizing COVID-19 diagnoses, testing, and clinical presentation. We introduce a span-based event extraction model that jointly extracts all annotated phenomena, achieving high performance in identifying COVID-19 and symptom events with associated assertion values (0.83-0.97 F1 for events and 0.73-0.79 F1 for assertions). Our span-based event extraction model outperforms an extractor built on MetaMapLite for the identification of symptoms with assertion values. In a secondary use application, we predicted COVID-19 test results using structured patient data (e.g. vital signs and laboratory results) and automatically extracted symptom information, to explore the clinical presentation of COVID-19. Automatically extracted symptoms improve COVID-19 prediction performance, beyond structured data alone.
引用
收藏
页数:13
相关论文
共 50 条
  • [11] Intriguing new faces of Covid-19: persisting clinical symptoms and cardiac effects in children
    Erol, Nurdan
    Alpinar, Abdullah
    Erol, Cigdem
    Sari, Erdal
    Alkan, Kubra
    CARDIOLOGY IN THE YOUNG, 2022, 32 (07) : 1085 - 1091
  • [12] COVID-19 Event Extraction from Twitter via Extractive Question Answering with Continuous Prompts
    Jiang, Yuhang
    Kavuluru, Ramakanth
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 674 - 678
  • [13] COVID-19 Symptoms and Diagnoses among a Sociodemographically Diverse Cohort of Children from New York City: Lessons from the First Wave, Spring 2020
    Kahn, Linda G.
    Ghassabian, Akhgar
    Jacobson, Melanie H.
    Yu, Keunhyung
    Trasande, Leonardo
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2021, 18 (22)
  • [14] Identifying COVID-19 cases and extracting patient reported symptoms from Reddit using natural language processing
    Guo, Muzhe
    Ma, Yong
    Eworuke, Efe
    Khashei, Melissa
    Song, Jaejoon
    Zhao, Yueqin
    Jin, Fang
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [15] TRENDS IN PSYCHIATRIC DIAGNOSES BY COVID-19 INFECTION AND HOSPITALIZATION AMONG YOUTH WITH AND WITHOUT RECENT CLINICAL PSYCHIATRIC DIAGNOSES IN NEW YORK CITY
    Xiao, Yunyu
    Sharma, Mohit Manoj
    Thiruvalluru, Rohith
    Gimbrone, Catherine
    Weissman, Myrna
    Keyes, Katherine
    Olfson, Mark
    Pathak, Jyotishman
    JOURNAL OF THE AMERICAN ACADEMY OF CHILD AND ADOLESCENT PSYCHIATRY, 2023, 62 (10): : S362 - S362
  • [16] Identifying COVID-19 cases and extracting patient reported symptoms from Reddit using natural language processing
    Muzhe Guo
    Yong Ma
    Efe Eworuke
    Melissa Khashei
    Jaejoon Song
    Yueqin Zhao
    Fang Jin
    Scientific Reports, 13
  • [17] NEW OR WORSENING OVERACTIVE BLADDER SYMPTOMS AFTER RECOVERY FROM COVID-19
    Chen, Wen
    Komnenov, Dragana
    Timar, Ryan
    Wills, Melissa
    Dhar, Sorabh
    Dhar, Nivedita
    JOURNAL OF UROLOGY, 2021, 206 : E1101 - E1102
  • [18] Prevalence and risk factors of psychiatric symptoms and diagnoses before and during the COVID-19 pandemic: findings from the ELSA-Brasil COVID-19 mental health cohort
    Brunoni, Andre Russowsky
    Chian Suen, Paulo Jeng
    Bacchi, Pedro Starzynski
    Razza, Lais Boralli
    Klein, Izio
    dos Santos, Leonardo Afonso
    Santos, Itamar de Souza
    Lane Valiengo, Leandro da Costa
    Gallucci-Neto, Jose
    Moreno, Marina Lopes
    Pinto, Bianca Silva
    Silva Felix, Larissa de Cassia
    de Sousa, Juliana Pereira
    Viana, Maria Carmen
    Forte, Pamela Marques
    de Altisent Oliveira Cardoso, Marcia Cristina
    Bittencourt, Marcio Sommer
    Pelosof, Rebeca
    de Siqueira, Luciana Lima
    Fatori, Daniel
    Bellini, Helena
    Silveira Bueno, Priscila Vilela
    Passos, Ives Cavalcante
    Nunes, Maria Angelica
    Salum, Giovanni Abrahao
    Bauermeister, Sarah
    Smoller, Jordan W.
    Lotufo, Paulo Andrade
    Bensenor, Isabela Martins
    PSYCHOLOGICAL MEDICINE, 2023, 53 (02) : 446 - 457
  • [19] Machine Learning Techniques for Extracting Relevant Features from Clinical Data for COVID-19 Mortality Prediction
    Fraccaroli, Michele
    Mazzuchelli, Giulia
    Bizzarri, Alice
    26TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2021), 2021,
  • [20] Scanning the medical phenome to identify new diagnoses after recovery from COVID-19 in a US cohort
    Kerchberger, V. Eric
    Peterson, Josh F.
    Wei, Wei-Qi
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2023, 30 (02) : 233 - 244