Gene/protein name recognition based on support vector machine using dictionary as features

被引:33
|
作者
Mitsumori, T
Fation, S
Murata, M
Doi, K
Doi, H
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara 6300101, Japan
[2] Natl Inst Informat & Commun Technol, Kyoto 6190289, Japan
关键词
D O I
10.1186/1471-2105-6-S1-S8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition. Results: In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting. Conclusion: During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Gene/protein name recognition based on support vector machine using dictionary as features
    Tomohiro Mitsumori
    Sevrani Fation
    Masaki Murata
    Kouichi Doi
    Hirohumi Doi
    BMC Bioinformatics, 6
  • [2] Fractional Fourier transform based features for speaker recognition using support vector machine
    Ajmera, Pawan K.
    Holambe, Raghunath S.
    COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (02) : 550 - 557
  • [3] Protein-Protein Recognition Prediction Using Support Vector Machine Based on Feature Vectors
    Kuo, Huang-Cheng
    Ong, Ping-Lin
    Lin, Jung-Chang
    Huang, Jen-Peng
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, PROCEEDINGS, 2008, : 200 - +
  • [4] Phoneme Recognition Using Support Vector Machine and Different Features Representations
    Amami, Rimah
    Ben Ayed, Dorra
    Ellouze, Noureddine
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2012, 151 : 587 - 595
  • [5] A Driver Fatigue Recognition Model Using Multiple Physiological Features based on Support Vector Machine
    Li Shiwu
    Wang Linhong
    Yang Zhifa
    Ji Bingkui
    Qiao Feiyan
    Yang Zhongkai
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (12A): : 5321 - 5328
  • [6] Protein Name Recognition Based on Dictionary Mining and Heuristics
    Lin, Shian-Hua
    Ding, Shao-Hong
    Zeng, Wei-Sheng
    ALGORITHMIC ASPECTS IN INFORMATION AND MANAGEMENT, AAIM 2014, 2014, 8546 : 75 - 87
  • [7] HUMAN ACTIVITY RECOGNITION USING BODY POSE FEATURES AND SUPPORT VECTOR MACHINE
    Bengalur, Megha D.
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 1970 - 1975
  • [8] Features classification using support vector machine for a facial expression recognition system
    Patil, Rajesh A.
    Sahula, Vineet
    Mandal, Atanendu S.
    JOURNAL OF ELECTRONIC IMAGING, 2012, 21 (04)
  • [9] Weed/corn seedling recognition by support vector machine using texture features
    Wu, Lanlan
    Wen, Youxian
    AFRICAN JOURNAL OF AGRICULTURAL RESEARCH, 2009, 4 (09): : 840 - 846
  • [10] Automatic Digital Modulation Recognition Based on Novel Features and Support Vector Machine
    Hassanpour, Salman
    Pezeshk, Amir Mansour
    Behnia, Fereidoon
    2016 12TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS (SITIS), 2016, : 172 - 177