RECOGNITION OF GENE/PROTEIN NAMES USING CONDITIONAL RANDOM FIELDS

被引:0
|
作者
Campos, David [1 ]
Matos, Sergio [1 ]
Oliveira, Jose Luis [1 ]
机构
[1] Univ Aveiro, Inst Elect & Telemat Engn Aveiro, Campus Univ Santiago, P-3810193 Aveiro, Portugal
关键词
Natural Language Processing; Text Mining; Machine Learning; Named Entity Recognition; Gene/Protein Names; PROTEIN;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the overwhelming amount of publicly available data in the biomedical field, traditional tasks performed by expert database annotators rapidly became hard and very expensive. This situation led to the development of computerized systems to extract information in a structured manner. The first step of such systems requires the identification of named entities (e.g. gene/protein names), a task called Named Entity Recognition (NER). Much of the current research to tackle this problem is based on Machine Learning (ML) techniques, which demand careful and sensitive definition of the several used methods. This article presents a NER system using Conditional Random Fields (CRFs) as the machine learning technique, combining the best techniques recently described in the literature. The proposed system uses biomedical knowledge and a large set of orthographic and morphological features. An F-measure of 0,7936 was obtained on the BioCreative II Gene Mention corpus, achieving a significantly better performance than similar baseline systems.
引用
收藏
页码:275 / 280
页数:6
相关论文
共 50 条
  • [1] Protein fold recognition using segmentation conditional random fields (SCRFs)
    Liu, Y
    Carbonell, J
    Weigele, P
    Gopalakrishnan, V
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (02) : 394 - 406
  • [2] Identifying gene and protein mentions in text using conditional random fields
    McDonald, R
    Pereira, F
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [3] Identifying gene and protein mentions in text using conditional random fields
    Ryan McDonald
    Fernando Pereira
    [J]. BMC Bioinformatics, 6
  • [4] Incorporating dictionary features into conditional random fields for gene/protein named entity recognition
    Lin, Hongfei
    Li, Yanpeng
    Yang, Zhihao
    [J]. EMERGING TECHNOLOGIES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2007, 4819 : 162 - 173
  • [5] Handwritten word recognition using conditional random fields
    Shetty, Shravya
    Srinivasan, Harish
    Srihari, Sargur
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 1098 - 1102
  • [6] Named Entity Recognition using Conditional Random Fields
    Patil, Nita
    Patil, Ajay
    Pawar, B., V
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 1181 - 1188
  • [7] Speech Recognition Using Augmented Conditional Random Fields
    Hifny, Yasser
    Renals, Steve
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (02): : 354 - 365
  • [8] Named Entity Recognition Using Conditional Random Fields
    Khan, Wahab
    Daud, Ali
    Shahzad, Khurram
    Amjad, Tehmina
    Banjar, Ameen
    Fasihuddin, Heba
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (13):
  • [10] Conrad: Gene prediction using conditional random fields
    DeCaprio, David
    Vinson, Jade P.
    Pearson, Matthew D.
    Montgomery, Philip
    Doherty, Matthew
    Galagan, James E.
    [J]. GENOME RESEARCH, 2007, 17 (09) : 1389 - 1398