A cascaded approach to normalising gene mentions in biomedical literature

被引：1

作者：

Yang, Hui ^{[1
]}

Nenadic, Goran ^{[1
]}

Keane, John A. ^{[1
]}

机构：

[1] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England

来源：

BIOINFORMATION | 2007年 / 2卷 / 05期

基金：

英国生物技术与生命科学研究理事会;

关键词：

gene name normalisation; gene name mapping; lexical variability; text mining;

D O I：

10.6026/97320630002197

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gene mentions in the literature to referent genomic databases, where preprocessing of both gene synonyms in the databases and gene mentions in text are first applied. The mapping method employs a cascaded approach, which combines exact, exact-like and token-based approximate matching by using flexible representations of a gene synonym dictionary and gene mentions generated during the pre-processing phase. We also consider multi-gene name mentions and permutation of components in gene names. A systematic evaluation of the suggested methods has identified steps that are beneficial for improving either precision or recall in gene name identification. The results of the experiments on the BioCreAtIvE2 data sets (identification of human gene names) demonstrated that our methods achieved highly encouraging results with F-measure of up to 81.20%.

引用

页码：197 / 206

页数：10

共 50 条

[41] What makes a gene name? Named entity recognition in the biomedical literature
Leser, U
Hakenberg, J
BRIEFINGS IN BIOINFORMATICS, 2005, 6 (04) : 357 - 369
[42] Discovering gene-gene relations from fuzzy sequential sentence patterns in biomedical literature
Chiang, JH
Yin, ZX
Chen, CY
2004 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, PROCEEDINGS, 2004, : 1165 - 1168
[43] Text mining biomedical literature for discovering gene-to-gene relationships: A comparative study of algorithms
Liu, Y
Navathe, SB
Civera, J
Dasigi, V
Ram, A
Ciliax, BJ
Dingledine, R
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2005, 2 (01) : 62 - 76
[44] tmVar: a text mining approach for extracting sequence variants in biomedical literature
Wei, Chih-Hsuan
Harris, Bethany R.
Kao, Hung-Yu
Lu, Zhiyong
BIOINFORMATICS, 2013, 29 (11) : 1433 - 1439
[45] Predicting speculation: A simple disambiguation approach to hedge detection in biomedical literature
Velldal E.
Journal of Biomedical Semantics, 2 (Suppl 5)
[46] Machine learning approach to identify adverse events in scientific biomedical literature
Wewering, Sonja
Pietsch, Claudia
Sumner, Marc
Marko, Kornel
Luelf-Averhoff, Anna-Theresa
Baehrens, David
CTS-CLINICAL AND TRANSLATIONAL SCIENCE, 2022, 15 (06): : 1500 - 1506
[47] Moara: a Java library for extracting and normalizing gene and protein mentions
Mariana L Neves
José-María Carazo
Alberto Pascual-Montano
BMC Bioinformatics, 11
[48] A neural network-based joint learning approach for biomedical entity and relation extraction from biomedical literature
Luo, Ling
Yang, Zhihao
Cao, Mingyu
Wang, Lei
Zhang, Yin
Lin, Hongfei
JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 103
[49] Knowledge-Based Approach for Named Entity Recognition in Biomedical Literature: A Use Case in Biomedical Software Identification
Amith, Muhammad
Zhang, Yaoyun
Xu, Hua
Tao, Cui
ADVANCES IN ARTIFICIAL INTELLIGENCE: FROM THEORY TO PRACTICE (IEA/AIE 2017), PT II, 2017, 10351 : 386 - 395
[50] Identifying gene and protein mentions in text using conditional random fields
McDonald, R
Pereira, F
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)

← 1 2 3 4 5 →