A cascaded approach to normalising gene mentions in biomedical literature

被引:1
|
作者
Yang, Hui [1 ]
Nenadic, Goran [1 ]
Keane, John A. [1 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
基金
英国生物技术与生命科学研究理事会;
关键词
gene name normalisation; gene name mapping; lexical variability; text mining;
D O I
10.6026/97320630002197
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gene mentions in the literature to referent genomic databases, where preprocessing of both gene synonyms in the databases and gene mentions in text are first applied. The mapping method employs a cascaded approach, which combines exact, exact-like and token-based approximate matching by using flexible representations of a gene synonym dictionary and gene mentions generated during the pre-processing phase. We also consider multi-gene name mentions and permutation of components in gene names. A systematic evaluation of the suggested methods has identified steps that are beneficial for improving either precision or recall in gene name identification. The results of the experiments on the BioCreAtIvE2 data sets (identification of human gene names) demonstrated that our methods achieved highly encouraging results with F-measure of up to 81.20%.
引用
收藏
页码:197 / 206
页数:10
相关论文
共 50 条
  • [31] A survey of mutations in biomedical literature using a machine based approach
    Koyama, Takahiko
    Rhrissorrakrai, Kahn
    Parida, Laxmi
    CANCER RESEARCH, 2017, 77
  • [32] Discovering gene-gene relations from sequential sentence patterns in biomedical literature
    Chiang, Jung-Hsien
    Liu, Hsiao-Sheng
    Chao, Shih-Yi
    Chen, Cheng-Yu
    EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (04) : 1036 - 1041
  • [33] Inference and validation of predictive gene networks from biomedical literature and gene expression data
    Olsen, Catharina
    Fleming, Kathleen
    Prendergast, Niall
    Rubio, Renee
    Emmert-Streib, Frank
    Bontempi, Gianluca
    Haibe-Kains, Benjamin
    Quackenbush, John
    GENOMICS, 2014, 103 (5-6) : 329 - 336
  • [34] Twitter mentions and academic citations in the urologic oncology literature.
    Hayon, Solomon
    Stormont, Ian
    Dunne, Meagan M.
    Siddiqui, Mohummad Minhaj
    JOURNAL OF CLINICAL ONCOLOGY, 2017, 35 (06)
  • [35] Inter-species normalization of gene mentions with GNAT
    Hakenberg, Joerg
    Plake, Conrad
    Leaman, Robert
    Schroeder, Michael
    Gonzalez, Graciela
    BIOINFORMATICS, 2008, 24 (16) : I126 - I132
  • [36] Association Between Twitter Mentions and Academic Citations in Otolaryngology Literature
    Deshpande, Nikita
    Crossley, Jason R.
    Malekzadeh, Sonya
    OTOLARYNGOLOGY-HEAD AND NECK SURGERY, 2022, 167 (01) : 73 - 78
  • [37] PathNER: a tool for systematic identification of biological pathway mentions in the literature
    Wu, Chengkun
    Schwartz, Jean-Marc
    Nenadic, Goran
    BMC SYSTEMS BIOLOGY, 2013, 7 : S2
  • [38] Normalising green behaviours: A new approach to sustainability marketing
    Rettie, Ruth
    Burchell, Kevin
    Riley, Debra
    JOURNAL OF MARKETING MANAGEMENT, 2012, 28 (3-4) : 420 - 444
  • [39] Twitter Mentions and Academic Citations in the Urologic Literature COMMENT REPLY
    Hayon, Solomon
    Tripathi, Hemant Kumar
    Stormont, Ian M.
    Dunne, Meagan M.
    Naslund, Michael J.
    Siddiqui, Mohummad M.
    UROLOGY, 2019, 123 : 32 - 33
  • [40] Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature
    Chen, Guocai
    Zhao, Jieyi
    Cohen, Trevor
    Tao, Cui
    Sun, Jingchun
    Xu, Hua
    Bernstam, Elmer V.
    Lawson, Andrew
    Zeng, Jia
    Johnson, Amber M.
    Holla, Vijaykumar
    Bailey, Ann M.
    Lara-Guerra, Humberto
    Litzenburger, Beate
    Meric-Bernstam, Funda
    Zheng, W. Jim
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2015,