A Hybrid Strategy to Protein Name Recognition

被引:0
|
作者
Wang, Haochang [1 ]
Zhao, Tiejun [2 ]
机构
[1] Daqing Petr Inst, Coll Comp & Informat Technol, Daqing 163318, Peoples R China
[2] Harbin Inst Technol, MOE MS Key Lab Nat Language Proc & Speech, Harbin 150001, Peoples R China
关键词
name entity recognition; Generalized Winnow; feature selection; boundary expansion;
D O I
10.1109/WCICA.2008.4592995
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a comprehensive approach to identifying protein name in biomedical texts. The new method integrated the Generalized Winnow algorithm and the heuristic rules to implement of initial detection of protein name. Moreover, the system introduced a statistic method to analyses the reliability of recognized protein boundary, which can be then used for expanding protein boundary which has low confidence. The experimental results show that this algorithm improves the whole performance for protein name recognition and that effective performance can be achieved in identifying boundary of protein name.
引用
收藏
页码:627 / +
页数:2
相关论文
共 13 条
  • [1] [Anonymous], P 7 C NAT LANG LEARN
  • [2] Florian R., 2003, Proceedings of CoNLL-2003, P168, DOI DOI 10.3115/1119176.1119201
  • [3] Protein names and how to find them
    Franzén, K
    Eriksson, G
    Olsson, F
    Asker, L
    Lidén, P
    Cöster, J
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2002, 67 (1-3) : 49 - 61
  • [4] Fukuda K, 1998, Pac Symp Biocomput, P707
  • [5] KIM JD, 2004, P JOINT WORKSH NAT L
  • [6] Kulick Seth., 2004, PROC BIOLINK 2004, P61
  • [7] Mika S, 2004, BIOINFORMATICS, V20, P241, DOI 10.1093/bioinformatics/bth904
  • [8] SEKI K, 2003, P COMP SYST BIOINF
  • [9] Tsuruoka Y, 2005, LECT NOTES COMPUT SC, V3746, P382
  • [10] THE ZERO-FREQUENCY PROBLEM - ESTIMATING THE PROBABILITIES OF NOVEL EVENTS IN ADAPTIVE TEXT COMPRESSION
    WITTEN, IH
    BELL, TC
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1991, 37 (04) : 1085 - 1094