A cascaded approach to normalising gene mentions in biomedical literature

被引:1
|
作者
Yang, Hui [1 ]
Nenadic, Goran [1 ]
Keane, John A. [1 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
基金
英国生物技术与生命科学研究理事会;
关键词
gene name normalisation; gene name mapping; lexical variability; text mining;
D O I
10.6026/97320630002197
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gene mentions in the literature to referent genomic databases, where preprocessing of both gene synonyms in the databases and gene mentions in text are first applied. The mapping method employs a cascaded approach, which combines exact, exact-like and token-based approximate matching by using flexible representations of a gene synonym dictionary and gene mentions generated during the pre-processing phase. We also consider multi-gene name mentions and permutation of components in gene names. A systematic evaluation of the suggested methods has identified steps that are beneficial for improving either precision or recall in gene name identification. The results of the experiments on the BioCreAtIvE2 data sets (identification of human gene names) demonstrated that our methods achieved highly encouraging results with F-measure of up to 81.20%.
引用
收藏
页码:197 / 206
页数:10
相关论文
共 50 条
  • [21] Mining Biomedical Literature: An Open Source and Modular Approach
    Almeida, Hayda
    Jean-Louis, Ludovic
    Meurs, Marie-Jean
    ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2016, 2016, 9673 : 168 - 179
  • [22] Softcite dataset: A dataset of software mentions in biomedical and economic research publications
    Du, Caifan
    Cohoon, Johanna
    Lopez, Patrice
    Howison, James
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2021, 72 (07) : 870 - 884
  • [23] Twitter Mentions and Academic Citations in the Urologic Literature COMMENT
    Hugar, Lee A.
    Averch, Timothy D.
    UROLOGY, 2019, 123 : 32 - 32
  • [24] ResidueFinder: extracting individual residue mentions from protein literature
    Ton E Becker
    Eric Jakobsson
    Journal of Biomedical Semantics, 12
  • [25] Text Mining Biomedical Literature for Constructing Gene Regulatory Networks
    Song, Yong-Ling
    Chen, Su-Shing
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2009, 1 (03) : 179 - 186
  • [26] Text mining biomedical literature for constructing gene regulatory networks
    Yong-Ling Song
    Su-Shing Chen
    Interdisciplinary Sciences: Computational Life Sciences, 2009, 1 : 179 - 186
  • [27] Mining gene-related information from biomedical literature
    Tudor, Catalina O.
    Vijay-Shanker, K.
    Schmidt, Carl J.
    BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 335 - 335
  • [28] Biomedical literature mining for text classification and construction of gene networks
    Antonakaki, Despoina
    Kanterakis, Alexandros
    Potamias, George
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 3955 : 469 - 473
  • [29] Can altmetric mentions reflect the quality of evidence? A study in Biomedical and Life Sciences
    Valderrama, Pilar
    Arroyo-Machado, Wenceslao
    Baca, Adela
    Torres-Salinas, Daniel
    SCIENTOMETRICS, 2025, : 2345 - 2356
  • [30] ResidueFinder: extracting individual residue mentions from protein literature
    Becker, Ton E.
    Jakobsson, Eric
    JOURNAL OF BIOMEDICAL SEMANTICS, 2021, 12 (01)