Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS

被引:90
|
作者
Li, Bi-Qing [2 ,3 ]
Feng, Kai-Yan [4 ]
Chen, Lei [5 ]
Huang, Tao [2 ,3 ,6 ]
Cai, Yu-Dong [1 ]
机构
[1] Shanghai Univ, Inst Syst Biol, Shanghai, Peoples R China
[2] Chinese Acad Sci, Shanghai Inst Biol Sci, Key Lab Syst Biol, Shanghai, Peoples R China
[3] Shanghai Ctr Bioinformat Technol, Shanghai, Peoples R China
[4] Beijing Genom Inst, Shenzhen, Peoples R China
[5] Shanghai Maritime Univ, Coll Informat Engn, Shanghai, Peoples R China
[6] Mt Sinai Sch Med, Dept Genet & Genom Sci, New York, NY USA
来源
PLOS ONE | 2012年 / 7卷 / 08期
关键词
SECONDARY-STRUCTURE; SEQUENCE PROFILE; HOT-SPOTS; CLASSIFICATION; INTERFACES; PROGRAM; RESIDUE; IDENTIFICATION; INFORMATION; DATABASE;
D O I
10.1371/journal.pone.0043927
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Prediction of Protein-Protein Interaction Sites by Multifeature Fusion and RF with mRMR and IFS
    Zhang, JunYan
    Lyu, Yinghua
    Ma, Zhiqiang
    DISEASE MARKERS, 2022, 2022
  • [2] Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm
    Li, Minjie
    Wu, Ziheng
    Wang, Wenyan
    Lu, Kun
    Zhang, Jun
    Zhou, Yuming
    Chen, Zhaoquan
    Li, Dan
    Zheng, Shicheng
    Chen, Peng
    Wang, Bing
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (06) : 3646 - 3654
  • [3] A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites
    Wei, Zhi-Sen
    Yang, Jing-Yu
    Shen, Hong-Bin
    Yu, Dong-Jun
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2015, 14 (07) : 746 - 760
  • [4] Predicting Citrullination Sites in Protein Sequences Using mRMR Method and Random Forest Algorithm
    Zhang, Qing
    Sun, Xijun
    Feng, Kaiyan
    Wang, ShaoPeng
    Zhang, Yu-Hang
    Wan, SiBao
    Lu, Lin
    Cai, Yu-Dong
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2017, 20 (02) : 164 - 173
  • [5] CONDITIONAL RANDOM FIELD BASED ALGORITHM FOR PROTEIN-PROTEIN INTERACTION PREDICTION
    Liu, Wei
    Chen, Ling
    Li, Bin
    OXIDATION COMMUNICATIONS, 2016, 39 (2A): : 1896 - 1906
  • [6] Random forest similarity for protein-protein interaction prediction from multiple sources
    Qi, YJ
    Klein-Seetharaman, J
    Bar-Joseph, Z
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2005, 2005, : 531 - 542
  • [7] Protein-protein interaction site prediction using random forest proximity distance
    Qiu, Zhijun
    Liu, Qingjie
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2021, 19 (01)
  • [8] Predict and Analyze Protein Glycation Sites with the mRMR and IFS Methods
    Liu, Yan
    Gu, Wenxiang
    Zhang, Wenyi
    Wang, Jianan
    BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [9] Prediction of Aptamer Protein Interaction Using Random Forest Algorithm
    Manju, N.
    Samiha, C. M.
    Kumar, S. P. Pavan
    Gururaj, H. L.
    Flammini, Francesco
    IEEE ACCESS, 2022, 10 : 49677 - 49687
  • [10] Seeing the trees through the forest: sequence-based homo- and heteromeric protein-protein interaction sites prediction using random forest
    Hou, Qingzhen
    De Geest, Paul F. G.
    Vranken, Wim F.
    Heringa, Jaap
    Feenstra, K. Anton
    BIOINFORMATICS, 2017, 33 (10) : 1479 - 1487