A cascaded approach to normalising gene mentions in biomedical literature

被引:1
|
作者
Yang, Hui [1 ]
Nenadic, Goran [1 ]
Keane, John A. [1 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
基金
英国生物技术与生命科学研究理事会;
关键词
gene name normalisation; gene name mapping; lexical variability; text mining;
D O I
10.6026/97320630002197
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gene mentions in the literature to referent genomic databases, where preprocessing of both gene synonyms in the databases and gene mentions in text are first applied. The mapping method employs a cascaded approach, which combines exact, exact-like and token-based approximate matching by using flexible representations of a gene synonym dictionary and gene mentions generated during the pre-processing phase. We also consider multi-gene name mentions and permutation of components in gene names. A systematic evaluation of the suggested methods has identified steps that are beneficial for improving either precision or recall in gene name identification. The results of the experiments on the BioCreAtIvE2 data sets (identification of human gene names) demonstrated that our methods achieved highly encouraging results with F-measure of up to 81.20%.
引用
收藏
页码:197 / 206
页数:10
相关论文
共 50 条
  • [41] What makes a gene name? Named entity recognition in the biomedical literature
    Leser, U
    Hakenberg, J
    BRIEFINGS IN BIOINFORMATICS, 2005, 6 (04) : 357 - 369
  • [42] Discovering gene-gene relations from fuzzy sequential sentence patterns in biomedical literature
    Chiang, JH
    Yin, ZX
    Chen, CY
    2004 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, PROCEEDINGS, 2004, : 1165 - 1168
  • [43] Text mining biomedical literature for discovering gene-to-gene relationships: A comparative study of algorithms
    Liu, Y
    Navathe, SB
    Civera, J
    Dasigi, V
    Ram, A
    Ciliax, BJ
    Dingledine, R
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2005, 2 (01) : 62 - 76
  • [44] tmVar: a text mining approach for extracting sequence variants in biomedical literature
    Wei, Chih-Hsuan
    Harris, Bethany R.
    Kao, Hung-Yu
    Lu, Zhiyong
    BIOINFORMATICS, 2013, 29 (11) : 1433 - 1439
  • [45] Predicting speculation: A simple disambiguation approach to hedge detection in biomedical literature
    Velldal E.
    Journal of Biomedical Semantics, 2 (Suppl 5)
  • [46] Machine learning approach to identify adverse events in scientific biomedical literature
    Wewering, Sonja
    Pietsch, Claudia
    Sumner, Marc
    Marko, Kornel
    Luelf-Averhoff, Anna-Theresa
    Baehrens, David
    CTS-CLINICAL AND TRANSLATIONAL SCIENCE, 2022, 15 (06): : 1500 - 1506
  • [47] Moara: a Java library for extracting and normalizing gene and protein mentions
    Mariana L Neves
    José-María Carazo
    Alberto Pascual-Montano
    BMC Bioinformatics, 11
  • [48] A neural network-based joint learning approach for biomedical entity and relation extraction from biomedical literature
    Luo, Ling
    Yang, Zhihao
    Cao, Mingyu
    Wang, Lei
    Zhang, Yin
    Lin, Hongfei
    JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 103
  • [49] Knowledge-Based Approach for Named Entity Recognition in Biomedical Literature: A Use Case in Biomedical Software Identification
    Amith, Muhammad
    Zhang, Yaoyun
    Xu, Hua
    Tao, Cui
    ADVANCES IN ARTIFICIAL INTELLIGENCE: FROM THEORY TO PRACTICE (IEA/AIE 2017), PT II, 2017, 10351 : 386 - 395
  • [50] Identifying gene and protein mentions in text using conditional random fields
    McDonald, R
    Pereira, F
    BMC BIOINFORMATICS, 2005, 6 (Suppl 1)