Semantic annotation of consumer health questions

被引:22
|
作者
Kilicoglu, Halil [1 ]
Ben Abacha, Asma [1 ]
Mrabet, Yassine [1 ]
Shooshan, Sonya E. [1 ]
Rodriguez, Laritza [1 ]
Masterton, Kate [1 ]
Demner-Fushman, Dina [1 ]
机构
[1] US Natl Lib Med, Lister Hill Natl Ctr Biomed Commun, 8600 Rockville Pike, Bethesda, MD 20894 USA
来源
BMC BIOINFORMATICS | 2018年 / 19卷
基金
美国国家卫生研究院;
关键词
Consumer health informatics; Question answering; Corpus annotation; Annotation confidence modeling; CLINICAL QUESTIONS; CORPUS; CARE;
D O I
10.1186/s12859-018-2045-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations. Results: The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most useful in estimating annotation confidence. Conclusions: To our knowledge, our corpus is the first focusing on annotation of uncurated consumer health questions. It is currently used to develop machine learning-based methods for question understanding. We make the corpus publicly available to stimulate further research on consumer health QA.
引用
收藏
页数:28
相关论文
共 50 条
  • [41] KIM - Semantic annotation platform
    Popov, B
    Kiryakov, A
    Kirilov, A
    Manov, D
    Ognyanoff, D
    Goranov, M
    [J]. SEMANTIC WEB - ISWC 2003, 2003, 2870 : 834 - 849
  • [42] Detecting Errors in Semantic Annotation
    Dickinson, Markus
    Lee, Chong Min
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 605 - 610
  • [43] Instantiation of relations for semantic annotation
    Tenier, S.
    Toussaint, Y.
    Napoli, A.
    Polanco, X.
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 463 - +
  • [44] Semantic annotation, indexing, and retrieval
    Kiryakov, A
    Popov, B
    Ognyanoff, D
    Manov, D
    Kirilov, A
    Goranov, M
    [J]. SEMANTIC WEB - ISWC 2003, 2003, 2870 : 484 - 499
  • [45] Semantic Annotation of Service Choreographies
    Vinh Thinh Luu
    Ciuciu, Ioana
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2014 WORKSHOPS, 2014, 8842 : 489 - 493
  • [46] A classification of semantic annotation systems
    Andrews, Pierre
    Zaihrayeu, Ilya
    Pane, Juan
    [J]. SEMANTIC WEB, 2012, 3 (03) : 223 - 248
  • [47] A Case for Semantic Annotation Of EHR
    Sreeninvasan, M.
    Chacko, Anu Mary
    [J]. 2020 IEEE 44TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2020), 2020, : 1363 - 1367
  • [48] Inconsistency Detection in Semantic Annotation
    Hollenstein, Nora
    Schneider, Nathan
    Webber, Bonnie
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3986 - 3990
  • [49] Semantic annotation and revision control
    Bahreini, Kiavash
    Elci, Atilla
    [J]. WEBIST 2008: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2, 2008, : 294 - 297
  • [50] Semantic Annotation for Java']Java
    Lyon, Douglas
    [J]. JOURNAL OF OBJECT TECHNOLOGY, 2010, 9 (03): : 19 - 29