Corpus annotation for mining biomedical events from literature

被引:155
|
作者
Kim, Jin-Dong [1 ]
Ohta, Tomoko [1 ]
Tsujii, Jun'ichi [1 ,2 ,3 ]
机构
[1] Univ Tokyo, Sch Informat Sci & Technol, Dept Comp Sci, Tokyo, Japan
[2] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
[3] Univ Manchester, Natl Ctr Text Min, Manchester, Lancs, England
关键词
D O I
10.1186/1471-2105-9-10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation. Results: We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech), syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1) to design a scheme of annotation which meets specific requirements of text annotation, (2) to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3) to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation. Conclusion: The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Gene Ontology (GO) Annotation in Biomedical Literature
    Galvez, Carmen
    ACTAS DE LA III CONFERENCIA IBERICA DE SISTEMAS Y TECNOLOGIAS DE LA INFORMACION, VOL 1, 2008, : 609 - 614
  • [22] Semantic annotation of biomedical literature using Google
    Saetre, R
    Tveit, A
    Steigedal, TS
    Laegreid, A
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2005, PT 3, 2005, 3482 : 327 - 337
  • [23] GOTA: GO term annotation of biomedical literature
    Pietro Di Lena
    Giacomo Domeniconi
    Luciano Margara
    Gianluca Moro
    BMC Bioinformatics, 16
  • [24] GOTA: GO term annotation of biomedical literature
    Di Lena, Pietro
    Domeniconi, Giacomo
    Margara, Luciano
    Moro, Gianluca
    BMC BIOINFORMATICS, 2015, 16
  • [25] Extracting semantically enriched events from biomedical literature
    Makoto Miwa
    Paul Thompson
    John McNaught
    Douglas B Kell
    Sophia Ananiadou
    BMC Bioinformatics, 13
  • [26] Extracting semantically enriched events from biomedical literature
    Miwa, Makoto
    Thompson, Paul
    McNaught, John
    Kell, Douglas B.
    Ananiadou, Sophia
    BMC BIOINFORMATICS, 2012, 13
  • [27] A robust approach to extract biomedical events from literature
    Bui, Quoc-Chinh
    Sloot, Peter M. A.
    BIOINFORMATICS, 2012, 28 (20) : 2654 - 2661
  • [28] Mining gene-related information from biomedical literature
    Tudor, Catalina O.
    Vijay-Shanker, K.
    Schmidt, Carl J.
    BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 335 - 335
  • [29] New directions in biomedical text annotation: definitions, guidelines and corpus construction
    W John Wilbur
    Andrey Rzhetsky
    Hagit Shatkay
    BMC Bioinformatics, 7
  • [30] Incorporating Zoning Information into Argument Mining from Biomedical Literature
    Liu, Boyang
    Schlegel, Viktor
    Batista-Navarro, Riza
    Ananiadou, Sophia
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6162 - 6169