A boosting approach for prediction of protein-RNA binding residues

被引:34
|
作者
Tang, Yongjun [1 ,2 ,3 ]
Liu, Diwei [4 ]
Wang, Zixiang [4 ]
Wen, Ting [4 ]
Deng, Lei [4 ]
机构
[1] Cent South Univ, Xiangya Hosp, Dept Clin Pharmacol, 87 Xiangya Rd, Changsha 410008, Hunan, Peoples R China
[2] Cent South Univ, Hunan Key Lab Pharmacogenet, Inst Clin Pharmacol, 87 Xiangya Rd, Changsha 410008, Hunan, Peoples R China
[3] Cent South Univ, Xiangya Hosp, Dept Pediat, 87 Xiangya Rd, Changsha 410008, Hunan, Peoples R China
[4] Cent South Univ, Sch Software, 22 Shaoshan South Rd, Changsha 410075, Hunan, Peoples R China
来源
BMC BIOINFORMATICS | 2017年 / 18卷
基金
中国国家自然科学基金;
关键词
RNA-binding residue; Gradient tree boosting; Structural neighborhood features; INTERACTION HOT-SPOTS; SOLVENT ACCESSIBILITY; SITES; RECOGNITION; IDENTIFICATION; NUCLEOTIDES; ANNOTATION; GENERATION; IMPROVES; SVM;
D O I
10.1186/s12859-017-1879-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: RNA binding proteins play important roles in post-transcriptional RNA processing and transcriptional regulation. Distinguishing the RNA-binding residues in proteins is crucial for understanding how protein and RNA recognize each other and function together as a complex. Results: We propose PredRBR, an effectively computational approach to predict RNA-binding residues. PredRBR is built with gradient tree boosting and an optimal feature set selected from a large number of sequence and structure characteristics and two categories of structural neighborhood properties. In cross-validation experiments on the RBP170 data set show that PredRBR achieves an overall accuracy of 0.84, a sensitivity of 0.85, MCC of 0.55 and AUC of 0.92, which are significantly better than that of other widely used machine learning algorithms such as Support Vector Machine, Random Forest, and Adaboost. We further calculate the feature importance of different feature categories and find that structural neighborhood characteristics are critical in the recognization of RNA binding residues. Also, PredRBR yields significantly better prediction accuracy on an independent test set (RBP101) in comparison with other state-of-the-art methods. Conclusions: The superior performance over existing RNA-binding residue prediction methods indicates the importance of the gradient tree boosting algorithm combined with the optimal selected features.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] A boosting approach for prediction of protein-RNA binding residues
    Yongjun Tang
    Diwei Liu
    Zixiang Wang
    Ting Wen
    Lei Deng
    [J]. BMC Bioinformatics, 18
  • [2] XGBPRH: Prediction of Binding Hot Spots at Protein-RNA Interfaces Utilizing Extreme Gradient Boosting
    Deng, Lei
    Sui, Yuanchao
    Zhang, Jingpu
    [J]. GENES, 2019, 10 (03)
  • [3] FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues
    EL-Manzalawy, Yasser
    Abbas, Mostafa
    Malluhi, Qutaibah
    Honavar, Vasant
    [J]. PLOS ONE, 2016, 11 (07):
  • [4] A convolutional network and attention mechanism-based approach to predict protein-RNA binding residues
    Li, Ke
    Wu, Hongwei
    Yue, Zhenyu
    Sun, Yu
    Xia, Chuan
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2023, 105
  • [5] Efficient mapping of RNA-binding residues in RNA-binding proteins using local sequence features of binding site residues in protein-RNA complexes
    Agarwal, Ankita
    Kant, Shri
    Bahadur, Ranjit Prasad
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2023, 91 (09) : 1361 - 1379
  • [6] A structure-based model for the prediction of protein-RNA binding affinity
    Nithin, Chandran
    Mukherjee, Sunandan
    Bahadur, Ranjit Prasad
    [J]. RNA, 2019, 25 (12) : 1628 - 1645
  • [7] Integrating thermodynamic and sequence contexts improves protein-RNA binding prediction
    Su, Yufeng
    Luo, Yunan
    Zhao, Xiaoming
    Liu, Yang
    Peng, Jian
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (09)
  • [8] The dataset for protein-RNA binding affinity
    Yang, Xiufeng
    Li, Haotian
    Huang, Yangyu
    Liu, Shiyong
    [J]. PROTEIN SCIENCE, 2013, 22 (12) : 1808 - 1811
  • [9] Individually double minimum-distance definition of protein-RNA binding residues and application to structure-based prediction
    Hu, Wen
    Qin, Liu
    Li, Menglong
    Pu, Xuemei
    Guo, Yanzhi
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2018, 32 (12) : 1363 - 1373
  • [10] Prediction of protein-RNA binding sites by a random forest method with combined features
    Liu, Zhi-Ping
    Wu, Ling-Yun
    Wang, Yong
    Zhang, Xiang-Sun
    Chen, Luonan
    [J]. BIOINFORMATICS, 2010, 26 (13) : 1616 - 1622