NTTMUNSW BioC modules for recognizing and normalizing species and gene/protein mentions

被引:0
|
作者
Dai, Hong-Jie [1 ,2 ]
Singh, Onkar [3 ]
Jonnagaddala, Jitendra [4 ,5 ]
Su, Emily Chia-Yu [3 ]
机构
[1] Natl Taitung Univ, Dept Comp Sci & Informat Engn, Taitung, Taiwan
[2] Natl Taitung Univ, Interdisciplinary Program Green & Informat Techno, Taitung, Taiwan
[3] Taipei Med Univ, Grad Inst Biomed Informat, Coll Med Sci & Technol, Taipei, Taiwan
[4] Univ New South Wales, Sch Publ Hlth & Community Med, Sydney, NSW, Australia
[5] Univ New South Wales, Prince Wales Clin Sch, Sydney, NSW, Australia
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2016年
关键词
GENE NORMALIZATION;
D O I
10.1093/database/baw111
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In recent years, the number of published biomedical articles has increased as researchers have focused on biological domains to investigate the functions of biological objects, such as genes and proteins. However, the ambiguous nature of genes and their products have rendered the literature more complex for readers and curators of molecular interaction databases. To address this challenge, a normalization technique that can link variants of biological objects to a single, standardized form was applied. In this work, we developed a species normalization module, which recognizes species names and normalizes them to NCBI Taxonomy IDs. Unlike most previous work, which ignored the prefix of a gene name that represents an abbreviation of the species name to which the gene belongs, the recognition results of our module include the prefixed species. The developed species normalization module achieved an overall F-score of 0.954 on an instance-level species normalization corpus. For gene normalization, two separate modules were respectively employed to recognize gene mentions and normalize those mentions to their Entrez Gene IDs by utilizing a multistage normalization algorithm developed for processing full-text articles. All of the developed modules are BioC-compatible. NET framework libraries and are publicly available from the NuGet gallery.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Identification modules of gastric cancer based on protein-protein interaction networks and gene expression data
    Cui, Wei
    Gu, Zhenfang
    Liu, Haiying
    Zhang, Chunmei
    Liu, Jie
    JOURNAL OF BUON, 2018, 23 (04): : 1013 - 1019
  • [22] Cross-species gene modules emerge from a systems biology approach to osteoarthritis
    Mueller, Alan James
    Canty-Laird, Elizabeth G.
    Clegg, Peter D.
    Tew, Simon R.
    NPJ SYSTEMS BIOLOGY AND APPLICATIONS, 2017, 3
  • [23] Growing functional modules from a seed protein via integration of protein interaction and gene expression data
    Maraziotis, Ioannis A.
    Dimitrakopoulou, Konstantina
    Bezerianos, Anastasios
    BMC BIOINFORMATICS, 2007, 8 (1)
  • [24] Growing functional modules from a seed protein via integration of protein interaction and gene expression data
    Ioannis A Maraziotis
    Konstantina Dimitrakopoulou
    Anastasios Bezerianos
    BMC Bioinformatics, 8
  • [25] Randomization of protein encoding gene during species evolution
    GE Weiwen 1 and HE Fuchu 2 1. Institute of Hygiene and Environment Medicine
    2. Institute of Radiation Medicine
    ChineseScienceBulletin, 1998, (04) : 299 - 303
  • [26] Randomization of protein encoding gene during species evolution
    Ge, WW
    He, FC
    CHINESE SCIENCE BULLETIN, 1998, 43 (04): : 299 - 303
  • [27] Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution
    Papadopoulos, Chris
    Callebaut, Isabelle
    Gelly, Jean-Christophe
    Hatin, Isabelle
    Namy, Olivier
    Renard, Maxime
    Lespinet, Olivier
    Lopes, Anne
    GENOME RESEARCH, 2021, 31 (12) : 2303 - 2315
  • [28] Normalizing gene expression by quantitative PCR during somatic embryogenesis in two representative conifer species: Pinus pinaster and Picea abies
    José J. de Vega-Bartol
    Raquen Raissa Santos
    Marta Simões
    Célia M. Miguel
    Plant Cell Reports, 2013, 32 : 715 - 729
  • [29] Normalizing gene expression by quantitative PCR during somatic embryogenesis in two representative conifer species: Pinus pinaster and Picea abies
    de Vega-Bartol, Jose J.
    Santos, Raquen Raissa
    Simoes, Marta
    Miguel, Celia M.
    PLANT CELL REPORTS, 2013, 32 (05) : 715 - 729
  • [30] Expression Atlas update: gene and protein expression in multiple species
    Moreno, Pablo
    Fexova, Silvie
    George, Nancy
    Manning, Jonathan R.
    Miao, Zhichiao
    Mohammed, Suhaib
    Munoz-Pomer, Alfonso
    Fullgrabe, Anja
    Bi, Yalan
    Bush, Natassja
    Iqbal, Haider
    Kumbham, Upendra
    Solovyev, Andrey
    Zhao, Lingyun
    Prakash, Ananth
    Garcia-Seisdedos, David
    Kundu, Deepti J.
    Wang, Shengbo
    Walzer, Mathias
    Clarke, Laura
    Osumi-Sutherland, David
    Tello-Ruiz, Marcela Karey
    Kumari, Sunita
    Ware, Doreen
    Eliasova, Jana
    Arends, Mark J.
    Nawijn, Martijn C.
    Meyer, Kerstin
    Burdett, Tony
    Marioni, John
    Teichmann, Sarah
    Vizcaino, Juan Antonio
    Brazma, Alvis
    Papatheodorou, Irene
    NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) : D129 - D140