The BioC-BioGRID corpus: full text articles annotated for curation of protein-protein and genetic interactions

被引:20
|
作者
Dogan, Rezarta Islamaj [1 ]
Kim, Sun [1 ]
Chatr-aryamontri, Andrew [2 ]
Chang, Christie S. [3 ]
Oughtred, Rose [3 ]
Rust, Jennifer [3 ]
Wilbur, W. John [1 ]
Comeau, Donald C. [1 ]
Dolinski, Kara [3 ]
Tyers, Mike [2 ,4 ]
机构
[1] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20894 USA
[2] Univ Montreal, Inst Res Immunol & Canc, Montreal, PQ H3C 3J7, Canada
[3] Princeton Univ, Lewis Sigler Inst Integrat Genom, Princeton, NJ 08544 USA
[4] Mt Sinai Hosp, Lunenfeld Tanenbaum Res Inst, Toronto, ON, Canada
基金
英国生物技术与生命科学研究理事会; 美国国家卫生研究院;
关键词
BIOLOGY; CHALLENGE; COMMUNITY;
D O I
10.1093/database/baw147
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a noncompetitive, cooperative task in which the participants worked together to build BioCcompatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein-protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future uses of the BioC-BioGRID corpus are detailed in this report.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Efficient Extraction of Protein-Protein Interactions from Full-Text Articles
    Hakenberg, Joerg
    Leaman, Robert
    Vo, Nguyen Ha
    Jonnalagadda, Siddhartha
    Sullivan, Ryan
    Miller, Christopher
    Tari, Luis
    Baral, Chitta
    Gonzalez, Graciela
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (03) : 481 - 494
  • [2] BioC-compatible full-text passage detection for protein-protein interactions using extended dependency graph
    Peng, Yifan
    Arighi, Cecilia
    Wu, Cathy H.
    Vijay-Shanker, K.
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
  • [3] TMAC: An automated text mining tool for construction of an annotated corpus to support protein-protein interaction information extraction
    Communication and Electronics Section, Faculty of Engineering, El Fayoum University, Fayoum, Egypt
    [J]. ICCTD - Int. Conf. Comput. Technol. Dev., Proc., (75-79):
  • [4] Assisting manual literature curation for protein-protein interactions using BioQRator
    Kwon, Dongseop
    Kim, Sun
    Shin, Soo-Yong
    Chatr-aryamontri, Andrew
    Wilbur, W. John
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2014,
  • [5] Protein-protein interactions and genetic diseases: The interactome
    Lage, Kasper
    [J]. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR BASIS OF DISEASE, 2014, 1842 (10): : 1971 - 1980
  • [6] Genetic approaches to the study of protein-protein interactions
    Appling, DR
    [J]. METHODS, 1999, 19 (02) : 338 - 349
  • [7] Genetic and biochemical probes for protein-protein interactions
    McNabb, DS
    Guarente, L
    [J]. CURRENT OPINION IN BIOTECHNOLOGY, 1996, 7 (05) : 554 - 559
  • [8] Understanding protein-protein interactions by genetic suppression
    Sitaraman Sujatha
    Dipankar Chatterji
    [J]. Journal of Genetics, 2000, 79 : 125 - 129
  • [9] Understanding protein-protein interactions by genetic suppression
    Sujatha, S
    Chatterji, D
    [J]. JOURNAL OF GENETICS, 2000, 79 (03) : 125 - 129
  • [10] Analysis of protein/protein interactions through biomedical literature: Text mining of abstracts vs. text mining of full text articles
    Martin, EPG
    Bremer, EG
    Guerin, MC
    DeSesa, C
    Jouve, O
    [J]. KNOWLEDGE EXPLORATION IN LIFE SCIENCE INFORMATICS, PROCEEDINGS, 2004, 3303 : 96 - 108