Two learning approaches for protein name extraction

被引:6
|
作者
Tatar, Serhan [1 ]
Cicekli, Ilyas [1 ]
机构
[1] Bilkent Univ, Dept Comp Engn, TR-06800 Ankara, Turkey
关键词
Statistical learning; Bigram language model; Rule learning; Protein name extraction; Information extraction; GENE; IDENTIFICATION; PERFORMANCE; BLAST;
D O I
10.1016/j.jbi.2009.05.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted experiments. in the first method, Bigram language model is used to extract protein names. In the latter, we use an automatic rule learning method that can identify protein names located in the biological texts. In both cases, we generalize protein names by using hierarchically categorized syntactic token types. We conducted our experiments on two different datasets. our first method based on Bigram language model achieved an F-score of 67.7% on the YAPEX dataset and 66.8% on the GENIA corpus. The developed rule learning method obtained 61.8% F-score value on the YAPEX dataset and 61.0% on the GENIA corpus. The results of the comparative experiments demonstrate that both techniques are applicable to the task of automatic protein name extraction, a prerequisite for the large-scale processing of biomedical literature. (C) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:1046 / 1055
页数:10
相关论文
共 50 条
  • [1] Two supervised learning approaches for name disambiguation in author citations
    Han, H
    Giles, L
    Zha, H
    Li, C
    Tsioutsiouliklis, K
    [J]. JCDL 2004: PROCEEDINGS OF THE FOURTH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES: GLOBAL REACH AND DIVERSE IMPACT, 2004, : 296 - 305
  • [2] Building a protein name dictionary from full text: a machine learning term extraction approach
    Lei Shi
    Fabien Campagne
    [J]. BMC Bioinformatics, 6
  • [3] Building a protein name dictionary from full text: a machine learning term extraction approach
    Shi, L
    Campagne, F
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [4] An approach to protein name extraction using heuristics and a dictionary
    Seki, K
    Mostafa, J
    [J]. ASIST 2003: PROCEEDINGS OF THE 66TH ASIST ANNUAL MEETING, VOL 40, 2003: HUMANIZING INFORMATION TECHNOLOGY: FROM IDEAS TO BITS AND BACK, 2003, 40 : 71 - 77
  • [5] Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches
    Zheng, Nantao
    Wang, Kairou
    Zhan, Weihua
    Deng, Lei
    [J]. CURRENT DRUG METABOLISM, 2019, 20 (03) : 177 - 184
  • [6] What's in a name? Two approaches to evaluating the label feminist
    Breen, Amanda B.
    Karpinski, Andrew
    [J]. SEX ROLES, 2008, 58 (5-6) : 299 - 310
  • [7] What’s in a Name? Two Approaches to Evaluating the Label Feminist
    Amanda B. Breen
    Andrew Karpinski
    [J]. Sex Roles, 2008, 58 : 299 - 310
  • [8] Two schools: Two approaches to personalized learning
    Jenkins, JM
    Keefe, JW
    [J]. PHI DELTA KAPPAN, 2002, 83 (06) : 449 - 456
  • [9] Improving the performance of dictionary-based approaches in protein name recognition
    Tsuruoka, Y
    Tsujii, J
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (06) : 461 - 470
  • [10] Unsupervised learning and rule extraction for Domain Name Server tunneling detection
    Aiello, Maurizio
    Mongelli, Maurizio
    Muselli, Marco
    Verda, Damiano
    [J]. INTERNET TECHNOLOGY LETTERS, 2019, 2 (02)