Mining protein phosphorylation information from biomedical literature using NLP parsing and Support Vector Machines

被引:1
|
作者
Raja, Kalpana [1 ,2 ]
Natarajan, Jeyakumar [1 ]
机构
[1] Bharathiar Univ, Sch Life Sci, Dept Bioinformat, Data Min & Text Min Lab, Coimbatore 641046, Tamil Nadu, India
[2] Univ Michigan, Sch Med, Dept Dermatol, Ann Arbor, MI USA
关键词
Human protein phosphorylation; hPP corpus; Support Vector Machines; Natural language processing; Information extraction; Post transcriptional modification; EXTRACTION; DATABASE; SYSTEM;
D O I
10.1016/j.cmpb.2018.03.022
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Extraction of protein phosphorylation information from biomedical literature has gained much attention because of the importance in numerous biological processes. Objective: In this study, we propose a text mining methodology which consists of two phases, NLP parsing and SVM classification to extract phosphorylation information from literature. Methods: First, using NLP parsing we divide the data into three base-forms depending on the biomedical entities related to phosphorylation and further classify into ten sub-forms based on their distribution with phosphorylation keyword. Next, we extract the phosphorylation entity singles/pairs/triplets and apply SVM to classify the extracted singles/pairs/triplets using a set of features applicable to each sub-form. Results: The performance of our methodology was evaluated on three corpora namely PLC, iProLink and hPP corpus. We obtained promising results of >85% F-score on ten sub-forms of training datasets on cross validation test. Our system achieved overall F-score of 93.0% on iProLink and 96.3% on hPP corpus test datasets. Furthermore, our proposed system achieved best performance on cross corpus evaluation and outperformed the existing system with recall of 90.1%. Conclusions: The performance analysis of our unique system on three corpora reveals that it extracts protein phosphorylation information efficiently in both non-organism specific general datasets such as PLC and iProLink, and human specific dataset such as hPP corpus. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:57 / 64
页数:8
相关论文
共 50 条
  • [21] Mining gene-related information from biomedical literature
    Tudor, Catalina O.
    Vijay-Shanker, K.
    Schmidt, Carl J.
    BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 335 - 335
  • [22] Incorporating Zoning Information into Argument Mining from Biomedical Literature
    Liu, Boyang
    Schlegel, Viktor
    Batista-Navarro, Riza
    Ananiadou, Sophia
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6162 - 6169
  • [23] Prediction of protein-protein interactions using support vector machines
    Dohkan, S
    Koike, A
    Takagi, T
    BIBE 2004: FOURTH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, PROCEEDINGS, 2004, : 576 - 583
  • [24] Extraction of the cancer information from microarray of gene expression using Support Vector Machines
    Wilinski, A
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS IV, 2006, 6159
  • [25] Mining Informative Hydrologic Data by Using Support Vector Machines and Elucidating Mined Data according to Information Entropy
    Chen, Shien-Tsung
    ENTROPY, 2015, 17 (03) : 1023 - 1041
  • [26] Named Entity Recognition in Biomedical Literature: A Comparison of Support Vector Machines and Conditional Random Fields
    Liu, Feng
    Chen, Yifei
    Manderick, Bernard
    ENTERPRISE INFORMATION SYSTEMS-BOOKS, 2008, 12 : 137 - 147
  • [27] Prediction of protein solvent accessibility using support vector machines
    Yuan, Z
    Burrage, K
    Mattick, JS
    PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2002, 48 (03): : 566 - 570
  • [28] Transmembrane protein topology prediction using support vector machines
    Nugent, Timothy
    Jones, David T.
    BMC BIOINFORMATICS, 2009, 10
  • [29] Transmembrane protein topology prediction using support vector machines
    Timothy Nugent
    David T Jones
    BMC Bioinformatics, 10
  • [30] Prediction of protein subcellular locations using support vector machines
    Li, NN
    Niu, XH
    Shi, F
    Li, XY
    ADVANCES IN NATURAL COMPUTATION, PT 1, PROCEEDINGS, 2005, 3610 : 1047 - 1051