Predicting RNA-binding sites of proteins using support vector machines and evolutionary information

被引:96
|
作者
Cheng, Cheng-Wei [1 ,2 ]
Su, Emily Chia-Yu [1 ,3 ,4 ]
Hwang, Jenn-Kang [3 ]
Sung, Ting-Yi [1 ]
Hsu, Wen-Lian [1 ,2 ]
机构
[1] Acad Sinica, Inst Informat Sci, Bioinformat Lab, Taipei, Taiwan
[2] Natl Tsing Hua Univ, Inst Informat Syst & Applicat, Hsinchu, Taiwan
[3] Natl Chiao Tung Univ, Inst Bioinformat, Hsinchu, Taiwan
[4] Acad Sinica, Taiwan Int Grad Program, Bioinformat Program, Taipei 115, Taiwan
关键词
Support Vector Machine; Support Vector Machine Classifier; Slide Window Size; PSSM Profile; Window Size Selection;
D O I
10.1186/1471-2105-9-S12-S6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: RNA-protein interaction plays an essential role in several biological processes, such as protein synthesis, gene expression, posttranscriptional regulation and viral infectivity. Identification of RNA-binding sites in proteins provides valuable insights for biologists. However, experimental determination of RNA-protein interaction remains time-consuming and labor-intensive. Thus, computational approaches for prediction of RNA-binding sites in proteins have become highly desirable. Extensive studies of RNA-binding site prediction have led to the development of several methods. However, they could yield low sensitivities in trade-off for high specificities. Results: We propose a method, RNAProB, which incorporates a new smoothed position-specific scoring matrix (PSSM) encoding scheme with a support vector machine model to predict RNA-binding sites in proteins. Besides the incorporation of evolutionary information from standard PSSM profiles, the proposed smoothed PSSM encoding scheme also considers the correlation and dependency from the neighboring residues for each amino acid in a protein. Experimental results show that smoothed PSSM encoding significantly enhances the prediction performance, especially for sensitivity. Using five-fold cross-validation, our method performs better than the state-of-the-art systems by 4.90%similar to 6.83%, 0.88%similar to 5.33%, and 0.10 similar to 0.23 in terms of overall accuracy, specificity, and Matthew's correlation coefficient, respectively. Most notably, compared to other approaches, RNAProB significantly improves sensitivity by 7.0%similar to 26.9% over the benchmark data sets. To prevent data over fitting, a three-way data split procedure is incorporated to estimate the prediction performance. Moreover, physicochemical properties and amino acid preferences of RNA-binding proteins are examined and analyzed. Conclusion: Our results demonstrate that smoothed PSSM encoding scheme significantly enhances the performance of RNA-binding site prediction in proteins. This also supports our assumption that smoothed PSSM encoding can better resolve the ambiguity of discriminating between interacting and non-interacting residues by modelling the dependency from surrounding residues. The proposed method can be used in other research areas, such as DNA-binding site prediction, protein-protein interaction, and prediction of posttranslational modification sites.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines
    Yu, Xiaojing
    Cao, Jianping
    Cai, Yudong
    Shi, Tieliu
    Li, Yixue
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2006, 240 (02) : 175 - 184
  • [22] Prediction of RNA-binding proteins from primary sequence by a support vector machine approach
    Han, LY
    Cai, CZ
    Lo, SL
    Chung, MCM
    Chen, YZ
    [J]. RNA, 2004, 10 (03) : 355 - 368
  • [23] Support vector machines for predicting apoptosis proteins types
    Huang, J
    Shi, F
    [J]. ACTA BIOTHEORETICA, 2005, 53 (01) : 39 - 47
  • [24] Support Vector Machines for Predicting Apoptosis Proteins Types
    Jing Huang
    Feng Shi
    [J]. Acta Biotheoretica, 2005, 53 : 39 - 47
  • [25] An ensemble of support vector machines for predicting virulent proteins
    Nanni, Loris
    Lumini, Alessandra
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) : 7458 - 7462
  • [26] Prediction of protein-glucose binding sites using support vector machines
    Nassif, Houssam
    Al-Ali, Hassan
    Khuri, Sawsan
    Keirouz, Walid
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2009, 77 (01) : 121 - 132
  • [27] RBPmap: a web server for mapping binding sites of RNA-binding proteins
    Paz, Inbal
    Kosti, Idit
    Ares, Manuel, Jr.
    Cline, Melissa
    Mandel-Gutfreund, Yael
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (W1) : W361 - W367
  • [28] Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence
    Cai, YD
    Lin, SL
    [J]. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2003, 1648 (1-2): : 127 - 133
  • [29] Prediction of dinucleotide-specific RNA-binding sites in proteins
    Fernandez, Michael
    Kumagai, Yutaro
    Standley, Daron M.
    Sarai, Akinori
    Mizuguchi, Kenji
    Ahmad, Shandar
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [30] Prediction of dinucleotide-specific RNA-binding sites in proteins
    Michael Fernandez
    Yutaro Kumagai
    Daron M Standley
    Akinori Sarai
    Kenji Mizuguchi
    Shandar Ahmad
    [J]. BMC Bioinformatics, 12