Mining protein phosphorylation information from biomedical literature using NLP parsing and Support Vector Machines

被引:1
|
作者
Raja, Kalpana [1 ,2 ]
Natarajan, Jeyakumar [1 ]
机构
[1] Bharathiar Univ, Sch Life Sci, Dept Bioinformat, Data Min & Text Min Lab, Coimbatore 641046, Tamil Nadu, India
[2] Univ Michigan, Sch Med, Dept Dermatol, Ann Arbor, MI USA
关键词
Human protein phosphorylation; hPP corpus; Support Vector Machines; Natural language processing; Information extraction; Post transcriptional modification; EXTRACTION; DATABASE; SYSTEM;
D O I
10.1016/j.cmpb.2018.03.022
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Extraction of protein phosphorylation information from biomedical literature has gained much attention because of the importance in numerous biological processes. Objective: In this study, we propose a text mining methodology which consists of two phases, NLP parsing and SVM classification to extract phosphorylation information from literature. Methods: First, using NLP parsing we divide the data into three base-forms depending on the biomedical entities related to phosphorylation and further classify into ten sub-forms based on their distribution with phosphorylation keyword. Next, we extract the phosphorylation entity singles/pairs/triplets and apply SVM to classify the extracted singles/pairs/triplets using a set of features applicable to each sub-form. Results: The performance of our methodology was evaluated on three corpora namely PLC, iProLink and hPP corpus. We obtained promising results of >85% F-score on ten sub-forms of training datasets on cross validation test. Our system achieved overall F-score of 93.0% on iProLink and 96.3% on hPP corpus test datasets. Furthermore, our proposed system achieved best performance on cross corpus evaluation and outperformed the existing system with recall of 90.1%. Conclusions: The performance analysis of our unique system on three corpora reveals that it extracts protein phosphorylation information efficiently in both non-organism specific general datasets such as PLC and iProLink, and human specific dataset such as hPP corpus. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:57 / 64
页数:8
相关论文
共 50 条
  • [1] Investigation into biomedical literature classification using support vector machines
    Polavarapu, N
    Navathe, SB
    Ramnarayanan, R
    Haque, AU
    Sahay, S
    Liu, Y
    2005 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, 2005, : 366 - 374
  • [2] PreBIND and Textomy - mining the biomedical literature for protein-protein interactions using a support vector machine
    Donaldson, I
    Martin, J
    de Bruijn, B
    Wolting, C
    Lay, V
    Tuekam, B
    Zhang, SD
    Baskin, B
    Bader, GD
    Michalickova, K
    Pawson, T
    Hogue, CWV
    BMC BIOINFORMATICS, 2003, 4 (1)
  • [3] PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine
    Ian Donaldson
    Joel Martin
    Berry de Bruijn
    Cheryl Wolting
    Vicki Lay
    Brigitte Tuekam
    Shudong Zhang
    Berivan Baskin
    Gary D Bader
    Katerina Michalickova
    Tony Pawson
    Christopher WV Hogue
    BMC Bioinformatics, 4
  • [4] Shallow semantic parsing using support vector machines
    Pradhan, S
    Ward, W
    Hacioglu, K
    Martin, JH
    Jurafsky, D
    HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2004, : 233 - 240
  • [5] Prediction of protein domains from sequence information using support vector machines
    Zou, Shuxue
    Huang, Yanxin
    Wang, Yan
    Zhou, Chunguang
    ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 3, PROCEEDINGS, 2006, 3973 : 674 - 681
  • [6] Using support vector machines to identify protein phosphorylation sites in viruses
    Huang, Shu-Yun
    Shi, Shao-Ping
    Qiu, Jian-Ding
    Liu, Ming-Chu
    JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2015, 56 : 84 - 90
  • [7] Prediction of Protein Phosphorylation Sites by Support Vector Machines
    Ishino, Tomoki
    Nishikawa, Ikuko
    Fukuchi, Satoshi
    Tohsato, Yukako
    Nishikawa, Ken
    PROCEEDINGS OF THE 2013 6TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2013), VOLS 1 AND 2, 2013, : 817 - 821
  • [8] Mining protein function from text using term-based support vector machines
    Rice, SB
    Nenadic, G
    Stapley, BJ
    BMC BIOINFORMATICS, 2005, 6
  • [9] Mining protein function from text using term-based support vector machines
    Simon B Rice
    Goran Nenadic
    Benjamin J Stapley
    BMC Bioinformatics, 6 (Suppl 1)
  • [10] Named entity recognition in biomedical literature using two-layer support vector machines
    Liu, Feng
    Chen, Yifei
    Manderick, Bernard
    ICEIS 2007: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2007, : 39 - 45