Annotating Named Entities in Consumer Health Questions

被引:0
|
作者
Kilicoglu, Halil [1 ]
Ben Abacha, Asma [1 ]
Mrabet, Yassine [1 ]
Roberts, Kirk [2 ]
Rodriguez, Laritza [1 ]
Shooshan, Sonya E. [1 ]
Demner-Fushman, Dina [1 ]
机构
[1] NIH, Lister Hill Natl Ctr Biomed Commun, Natl Lib Med, Bldg 10, Bethesda, MD 20892 USA
[2] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA
基金
美国国家卫生研究院;
关键词
consumer health questions; biomedical named entities; assisted annotation; nested entities;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
We describe a corpus of consumer health questions annotated with named entities. The corpus consists of 1548 de-identified questions about diseases and drugs, written in English. We defined 15 broad categories of biomedical named entities for annotation. A pilot annotation phase in which a small portion of the corpus was double-annotated by four annotators was followed by a main phase in which double annotation was carried out by six annotators, and a reconciliation phase in which all annotations were reconciled by an expert. We conducted the annotation in two modes, manual and assisted, to assess the effect of automatic pre-annotation and calculated inter-annotator agreement. We obtained moderate inter-annotator agreement; assisted annotation yielded slightly better agreement and fewer missed annotations than manual annotation. Due to complex nature of biomedical entities, we paid particular attention to nested entities for which we obtained slightly lower inter-annotator agreement, confirming that annotating nested entities is somewhat more challenging. To our knowledge, the corpus is the first of its kind for consumer health text and is publicly available.
引用
收藏
页码:3325 / 3332
页数:8
相关论文
共 50 条
  • [1] NorNE: Annotating Named Entities for Norwegian
    Jorgensen, Fredrik
    Aasmoe, Tobias
    Husevag, Anne-Stine Ruud
    Ovrelid, Lilja
    Velldal, Erik
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4547 - 4556
  • [2] Annotating Question Types for Consumer Health Questions
    Roberts, Kirk
    Masterton, Kate
    Fiszman, Marcelo
    Kilicoglu, Halil
    Demner-Fushman, Dina
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [3] Annotating Relations Between Named Entities with Crowdsourcing
    Collovini, Sandra
    Pereira, Bolivar
    dos Santos, Henrique D. P.
    Vieira, Renata
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2018), 2018, 10859 : 290 - 297
  • [4] Named entities in Czech:: Annotating data and developing NE tagger
    Sevcikova, Magda
    Zabokrtsky, Zdenek
    Kruza, Oldrich
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 188 - 195
  • [5] Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus
    Savkov, Aleksandar
    Carroll, John
    Koeling, Rob
    Cassell, Jackie
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2016, 50 (03) : 523 - 548
  • [6] Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus
    Aleksandar Savkov
    John Carroll
    Rob Koeling
    Jackie Cassell
    [J]. Language Resources and Evaluation, 2016, 50 : 523 - 548
  • [7] A dataset to answer visual questions about named entities
    Lerner, Paul
    Messoud, Salem
    Ferret, Olivier
    Guinaudeau, Camille
    Le Borgne, Herve
    Besancon, Romaric
    Moreno, Jose G.
    Melgarejo, Jesus Lovon
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2022, 63 (02): : 15 - 39
  • [8] EntiTies: An Interface for Annotating Ties between Entities in Text
    Feild, Henry
    Amello, Timothy
    Lombardo, Philip
    [J]. CHIIR'20: PROCEEDINGS OF THE 2020 CONFERENCE ON HUMAN INFORMATION INTERACTION AND RETRIEVAL, 2020, : 442 - 446
  • [9] On the Summarization of Consumer Health Questions
    Ben Abacha, Asma
    Demner-Fushman, Dina
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2228 - 2234
  • [10] Separating Named Entities
    Ulipova, Barbora
    Grac, Marek
    [J]. RASLAN 2014: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2014, : 91 - 96