A cascaded approach to normalising gene mentions in biomedical literature

被引:1
|
作者
Yang, Hui [1 ]
Nenadic, Goran [1 ]
Keane, John A. [1 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
基金
英国生物技术与生命科学研究理事会;
关键词
gene name normalisation; gene name mapping; lexical variability; text mining;
D O I
10.6026/97320630002197
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gene mentions in the literature to referent genomic databases, where preprocessing of both gene synonyms in the databases and gene mentions in text are first applied. The mapping method employs a cascaded approach, which combines exact, exact-like and token-based approximate matching by using flexible representations of a gene synonym dictionary and gene mentions generated during the pre-processing phase. We also consider multi-gene name mentions and permutation of components in gene names. A systematic evaluation of the suggested methods has identified steps that are beneficial for improving either precision or recall in gene name identification. The results of the experiments on the BioCreAtIvE2 data sets (identification of human gene names) demonstrated that our methods achieved highly encouraging results with F-measure of up to 81.20%.
引用
收藏
页码:197 / 206
页数:10
相关论文
共 50 条
  • [1] Automated recognition of malignancy mentions in biomedical literature
    Jin, Yang
    T McDonald, Ryan
    Lerman, Kevin
    Mandel, Mark A.
    Carroll, Steven
    Liberman, Mark Y.
    Pereira, Fernando C.
    Winters, Raymond S.
    White, Peter S.
    BMC BIOINFORMATICS, 2006, 7 (1)
  • [2] Automated recognition of malignancy mentions in biomedical literature
    Yang Jin
    Ryan T McDonald
    Kevin Lerman
    Mark A Mandel
    Steven Carroll
    Mark Y Liberman
    Fernando C Pereira
    Raymond S Winters
    Peter S White
    BMC Bioinformatics, 7
  • [3] A cascaded classification approach to disambiguating polysemous mentions with social chains
    Wei, Yu-Chuan
    Lin, Ming-Shun
    Chen, Hsin-Hsi
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (07) : 5404 - 5414
  • [4] A text mining approach to detect mentions of protein glycosylation in biomedical text
    Shukla, Daksha
    Jayaraman, Valadi K.
    BIOINFORMATION, 2012, 8 (16) : 758 - 762
  • [5] A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature
    Xu, Shuo
    An, Xin
    Zhu, Lijun
    Zhang, Yunliang
    Zhang, Haodong
    JOURNAL OF CHEMINFORMATICS, 2015, 7
  • [6] A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature
    Shuo Xu
    Xin An
    Lijun Zhu
    Yunliang Zhang
    Haodong Zhang
    Journal of Cheminformatics, 7
  • [7] PPPred: Classifying Protein-phenotype Co-mentions Extracted from Biomedical Literature
    Shahri, Morteza Pourreza
    Reynolds, Gillian
    Roe, Mandi Marie
    Kahanda, Indika
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 414 - 422
  • [8] Predicting Entity Mentions in Scientific Literature
    Zheng, Yalung
    Ezeiza, Jon
    Farzanehpour, Mehdi
    Urbani, Jacopo
    SEMANTIC WEB, ESWC 2019, 2019, 11503 : 379 - 393
  • [9] A cascaded approach to biomedical named entity recognition using a unified model
    Chan, Shing-Kit
    Lam, Wai
    Yu, Xiaofeng
    ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 93 - 102
  • [10] Gene name automatic recognition in biomedical literature
    Yang, Zhihao
    Lin, Hongfei
    Zhao, Jing
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 285 - 285