Semantic annotation of consumer health questions

被引:22
|
作者
Kilicoglu, Halil [1 ]
Ben Abacha, Asma [1 ]
Mrabet, Yassine [1 ]
Shooshan, Sonya E. [1 ]
Rodriguez, Laritza [1 ]
Masterton, Kate [1 ]
Demner-Fushman, Dina [1 ]
机构
[1] US Natl Lib Med, Lister Hill Natl Ctr Biomed Commun, 8600 Rockville Pike, Bethesda, MD 20894 USA
来源
BMC BIOINFORMATICS | 2018年 / 19卷
基金
美国国家卫生研究院;
关键词
Consumer health informatics; Question answering; Corpus annotation; Annotation confidence modeling; CLINICAL QUESTIONS; CORPUS; CARE;
D O I
10.1186/s12859-018-2045-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations. Results: The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most useful in estimating annotation confidence. Conclusions: To our knowledge, our corpus is the first focusing on annotation of uncurated consumer health questions. It is currently used to develop machine learning-based methods for question understanding. We make the corpus publicly available to stimulate further research on consumer health QA.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Semantic annotation of consumer health questions
    Halil Kilicoglu
    Asma Ben Abacha
    Yassine Mrabet
    Sonya E. Shooshan
    Laritza Rodriguez
    Kate Masterton
    Dina Demner-Fushman
    [J]. BMC Bioinformatics, 19
  • [2] Semantic Chunk Annotation for questions using Maximum Entropy
    Fan, Shixi
    Zhang, Yaoyun
    Ng, Wing W. Y.
    Wang, Xuan
    Wang, Xiaolong
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 450 - 454
  • [3] Semantic representation of consumer questions and physician answers
    Slaughter, Laura A.
    Soergel, Dagobert
    Rindflesch, Thomas C.
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2006, 75 (07) : 513 - 529
  • [4] On the Summarization of Consumer Health Questions
    Ben Abacha, Asma
    Demner-Fushman, Dina
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2228 - 2234
  • [5] Toward Dialogue Modeling: A Semantic Annotation Scheme for Questions and Answers
    Blandon, Maria Andrea Cruz
    Minnema, Gosse
    Nourbakhsh, Aria
    Boritchev, Maria
    Amblard, Maxime
    [J]. 13TH LINGUISTIC ANNOTATION WORKSHOP (LAW XIII), 2019, : 230 - 235
  • [6] Framework for Analyzing Consumer Health Questions
    Cao, J.
    [J]. DRUG SAFETY, 2018, 41 (11) : 1150 - 1150
  • [7] BERT-Assisted Semantic Annotation Correction for Emotion-Related Questions
    Kazemzadeh, Abe
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2022,
  • [8] Annotating Named Entities in Consumer Health Questions
    Kilicoglu, Halil
    Ben Abacha, Asma
    Mrabet, Yassine
    Roberts, Kirk
    Rodriguez, Laritza
    Shooshan, Sonya E.
    Demner-Fushman, Dina
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3325 - 3332
  • [9] Annotating Question Types for Consumer Health Questions
    Roberts, Kirk
    Masterton, Kate
    Fiszman, Marcelo
    Kilicoglu, Halil
    Demner-Fushman, Dina
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [10] Understanding questions and finding answers: semantic relation annotation to compute the Expected Answer Type
    Petukhova, Volha
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,