Annotating Named Entities in Consumer Health Questions

被引:0
|
作者
Kilicoglu, Halil [1 ]
Ben Abacha, Asma [1 ]
Mrabet, Yassine [1 ]
Roberts, Kirk [2 ]
Rodriguez, Laritza [1 ]
Shooshan, Sonya E. [1 ]
Demner-Fushman, Dina [1 ]
机构
[1] NIH, Lister Hill Natl Ctr Biomed Commun, Natl Lib Med, Bldg 10, Bethesda, MD 20892 USA
[2] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA
基金
美国国家卫生研究院;
关键词
consumer health questions; biomedical named entities; assisted annotation; nested entities;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
We describe a corpus of consumer health questions annotated with named entities. The corpus consists of 1548 de-identified questions about diseases and drugs, written in English. We defined 15 broad categories of biomedical named entities for annotation. A pilot annotation phase in which a small portion of the corpus was double-annotated by four annotators was followed by a main phase in which double annotation was carried out by six annotators, and a reconciliation phase in which all annotations were reconciled by an expert. We conducted the annotation in two modes, manual and assisted, to assess the effect of automatic pre-annotation and calculated inter-annotator agreement. We obtained moderate inter-annotator agreement; assisted annotation yielded slightly better agreement and fewer missed annotations than manual annotation. Due to complex nature of biomedical entities, we paid particular attention to nested entities for which we obtained slightly lower inter-annotator agreement, confirming that annotating nested entities is somewhat more challenging. To our knowledge, the corpus is the first of its kind for consumer health text and is publicly available.
引用
收藏
页码:3325 / 3332
页数:8
相关论文
共 50 条
  • [41] Crime Pattern Analysis by Identifying Named Entities and Relation Among Entities
    Das, Priyanka
    Das, Asit Kumar
    [J]. ADVANCED COMPUTATIONAL AND COMMUNICATION PARADIGMS, VOL 2, 2018, 706 : 75 - 84
  • [42] Recognition of Patient-Related Named Entities in Noisy Tele-Health Texts
    Kim, Mi-Young
    Xu, Ying
    Zaiane, Osmar R.
    Goebel, Randy
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2015, 6 (04)
  • [43] Automatically Identifying Topics of Consumer Health Questions in Chinese
    Guo, Haihong
    Na, Xu
    Li, Jiao
    [J]. MEDINFO 2017: PRECISION HEALTHCARE THROUGH INFORMATICS, 2017, 245 : 388 - 392
  • [44] Strategic questions for consumer-based health communications
    Sutton, SM
    Balch, GI
    Lefebvre, RC
    [J]. PUBLIC HEALTH REPORTS, 1995, 110 (06) : 725 - 733
  • [45] Annotating Question Decomposition on Complex Medical Questions
    Roberts, Kirk
    Masterton, Kate
    Fiszman, Marcelo
    Kilicoglu, Halil
    Demner-Fushman, Dina
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2598 - 2602
  • [46] Annotating Educational Questions for Student Response Analysis
    Godea, Andreea
    Nielsen, Rodney
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3557 - 3561
  • [47] A document processing pipeline for annotating chemical entities in scientific documents
    Campos, David
    Matos, Sergio
    Oliveira, Jose L.
    [J]. JOURNAL OF CHEMINFORMATICS, 2015, 7
  • [48] Annotating and Searching Web Tables Using Entities, Types and Relationships
    Limaye, Girija
    Sarawagi, Sunita
    Chakrabarti, Soumen
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 1338 - 1347
  • [49] A document processing pipeline for annotating chemical entities in scientific documents
    David Campos
    Sérgio Matos
    José L Oliveira
    [J]. Journal of Cheminformatics, 7
  • [50] Boosting a Semantic Search Engine by Named Entities
    Caputo, Annalina
    Basile, Pierpaolo
    Semeraro, Giovanni
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2009, 5722 : 241 - 250