Corpus annotation for mining biomedical events from literature

被引:155
|
作者
Kim, Jin-Dong [1 ]
Ohta, Tomoko [1 ]
Tsujii, Jun'ichi [1 ,2 ,3 ]
机构
[1] Univ Tokyo, Sch Informat Sci & Technol, Dept Comp Sci, Tokyo, Japan
[2] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
[3] Univ Manchester, Natl Ctr Text Min, Manchester, Lancs, England
关键词
D O I
10.1186/1471-2105-9-10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation. Results: We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech), syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1) to design a scheme of annotation which meets specific requirements of text annotation, (2) to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3) to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation. Conclusion: The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] Mining Faces from Biomedical Literature using Deep Learning
    Dawson, Mitchell
    Zisserman, Andrew
    Nellaker, Christoffer
    ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 562 - 567
  • [32] New directions in biomedical text annotation: definitions, guidelines and corpus construction
    Wilbur, W. John
    Rzhetsky, Andrey
    Shatkay, Hagit
    BMC BIOINFORMATICS, 2006, 7 (1)
  • [33] Assessing citation integrity in biomedical publications: corpus annotation and NLP models
    Sarol, Maria Janina
    Ming, Shufan
    Radhakrishna, Shruthan
    Schneider, Jodi
    Kilicoglu, Halil
    BIOINFORMATICS, 2024, 40 (07)
  • [34] Recent advances in biomedical literature mining
    Zhao, Sendong
    Su, Chang
    Lu, Zhiyong
    Wang, Fei
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [35] A statistical framework for biomedical literature mining
    Chung, Dongjun
    Lawson, Andrew
    Zheng, W. Jim
    STATISTICS IN MEDICINE, 2017, 36 (22) : 3461 - 3474
  • [36] Mining biomarker information in biomedical literature
    Erfan Younesi
    Luca Toldo
    Bernd Müller
    Christoph M Friedrich
    Natalia Novac
    Alexander Scheer
    Martin Hofmann-Apitius
    Juliane Fluck
    BMC Medical Informatics and Decision Making, 12
  • [37] Mining biomarker information in biomedical literature
    Younesi, Erfan
    Toldo, Luca
    Mueller, Bernd
    Friedrich, Christoph M.
    Novac, Natalia
    Scheer, Alexander
    Hofmann-Apitius, Martin
    Fluck, Juliane
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2012, 12
  • [38] A semantic-based workflow for biomedical literature annotation
    Sernadela, Pedro
    Oliveira, Jose Luis
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2017,
  • [39] Extracting Sentences Describing Biomolecular Events from the Biomedical Literature
    Nunes, Tiago
    Matos, Sergio
    Oliveira, Jose Luis
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 11TH INTERNATIONAL CONFERENCE, 2014, 290 : 417 - 424
  • [40] A Human-in-the-Loop Method for Annotation of Events in Biomedical Signals
    Seeuws, Nick
    De Vos, Maarten
    Bertrand, Alexander
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2025, 29 (01) : 95 - 106