NTTMUNSW BioC modules for recognizing and normalizing species and gene/protein mentions

被引:0
|
作者
Dai, Hong-Jie [1 ,2 ]
Singh, Onkar [3 ]
Jonnagaddala, Jitendra [4 ,5 ]
Su, Emily Chia-Yu [3 ]
机构
[1] Natl Taitung Univ, Dept Comp Sci & Informat Engn, Taitung, Taiwan
[2] Natl Taitung Univ, Interdisciplinary Program Green & Informat Techno, Taitung, Taiwan
[3] Taipei Med Univ, Grad Inst Biomed Informat, Coll Med Sci & Technol, Taipei, Taiwan
[4] Univ New South Wales, Sch Publ Hlth & Community Med, Sydney, NSW, Australia
[5] Univ New South Wales, Prince Wales Clin Sch, Sydney, NSW, Australia
关键词
GENE NORMALIZATION;
D O I
10.1093/database/baw111
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In recent years, the number of published biomedical articles has increased as researchers have focused on biological domains to investigate the functions of biological objects, such as genes and proteins. However, the ambiguous nature of genes and their products have rendered the literature more complex for readers and curators of molecular interaction databases. To address this challenge, a normalization technique that can link variants of biological objects to a single, standardized form was applied. In this work, we developed a species normalization module, which recognizes species names and normalizes them to NCBI Taxonomy IDs. Unlike most previous work, which ignored the prefix of a gene name that represents an abbreviation of the species name to which the gene belongs, the recognition results of our module include the prefixed species. The developed species normalization module achieved an overall F-score of 0.954 on an instance-level species normalization corpus. For gene normalization, two separate modules were respectively employed to recognize gene mentions and normalize those mentions to their Entrez Gene IDs by utilizing a multistage normalization algorithm developed for processing full-text articles. All of the developed modules are BioC-compatible. NET framework libraries and are publicly available from the NuGet gallery.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Moara: a Java library for extracting and normalizing gene and protein mentions
    Mariana L Neves
    José-María Carazo
    Alberto Pascual-Montano
    BMC Bioinformatics, 11
  • [2] Moara: a Java']Java library for extracting and normalizing gene and protein mentions
    Neves, Mariana L.
    Carazo, Jose-Maria
    Pascual-Montano, Alberto
    BMC BIOINFORMATICS, 2010, 11
  • [3] Extracting and Normalizing Gene/Protein Mentions with the Flexible and Trainable Moara Java']Java Library
    Neves, Mariana L.
    Maria Carazo, Jose
    Pascual-Montano, Alberto
    LINKING LITERATURE, INFORMATION, AND KNOWLEDGE FOR BIOLOGY, 2010, 6004 : 71 - 80
  • [4] Inter-species normalization of gene mentions with GNAT
    Hakenberg, Joerg
    Plake, Conrad
    Leaman, Robert
    Schroeder, Michael
    Gonzalez, Graciela
    BIOINFORMATICS, 2008, 24 (16) : I126 - I132
  • [5] Identifying gene and protein mentions in text using conditional random fields
    McDonald, R
    Pereira, F
    BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [6] Identifying gene and protein mentions in text using conditional random fields
    Ryan McDonald
    Fernando Pereira
    BMC Bioinformatics, 6
  • [7] Evaluating the automatic mapping of human gene and protein mentions to unique identifiers
    Morgan, Alexander A.
    Wellner, Benjamin
    Colombe, Jeffrey B.
    Arens, Robert
    Colosimo, Marc E.
    Hirschman, Lynette
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2007, 2007, : 281 - +
  • [8] Protein complex, gene, and regulatory modules in cancer heterogeneity
    Papanikolaou, Nikolaos A.
    Papavassiliou, Athanasios G.
    MOLECULAR MEDICINE, 2008, 14 (9-10) : 543 - 545
  • [9] Protein Complex, Gene, and Regulatory Modules in Cancer Heterogeneity
    Nikolaos A. Papanikolaou
    Athanasios G. Papavassiliou
    Molecular Medicine, 2008, 14 : 543 - 545
  • [10] Functional modules by relating protein interaction networks and gene expression
    Tornow, S
    Mewes, HW
    NUCLEIC ACIDS RESEARCH, 2003, 31 (21) : 6283 - 6289