Applying the Naive Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites

被引:196
|
作者
Murakami, Yoichi [1 ]
Mizuguchi, Kenji [1 ]
机构
[1] Natl Inst Biomed Innovat, Osaka, Japan
关键词
BINDING-SITES; SOLVENT ACCESSIBILITY; SECONDARY STRUCTURE; SEQUENCE PROFILE; DATA-BANK; DATABASE; INFORMATION; INTERFACES; NETWORKS;
D O I
10.1093/bioinformatics/btq302
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The limited availability of protein structures often restricts the functional annotation of proteins and the identification of their protein-protein interaction sites. Computational methods to identify interaction sites from protein sequences alone are, therefore, required for unraveling the functions of many proteins. This article describes a new method (PSIVER) to predict interaction sites, i.e. residues binding to other proteins, in protein sequences. Only sequence features (position-specific scoring matrix and predicted accessibility) are used for training a Naive Bayes classifier (NBC), and conditional probabilities of each sequence feature are estimated using a kernel density estimation method (KDE). Results: The leave-one out cross-validation of PSIVER achieved a Matthews correlation coefficient (MCC) of 0.151, an F-measure of 35.3%, a precision of 30.6% and a recall of 41.6% on a non-redundant set of 186 protein sequences extracted from 105 heterodimers in the Protein Data Bank (consisting of 36 219 residues, of which 15.2% were known interface residues). Even though the dataset used for training was highly imbalanced, a randomization test demonstrated that the proposed method managed to avoid overfitting. PSIVER was also tested on 72 sequences not used in training (consisting of 18 140 residues, of which 10.6% were known interface residues), and achieved an MCC of 0.135, an F-measure of 31.5%, a precision of 25.0% and a recall of 46.5%, outperforming other publicly available servers tested on the same dataset. PSIVER enables experimental biologists to identify potential interface residues in unknown proteins from sequence information alone, and to mutate those residues selectively in order to unravel protein functions.
引用
收藏
页码:1841 / 1848
页数:8
相关论文
共 50 条
  • [1] Prediction of Protein-Protein Interaction Sites Based on Naive Bayes Classifier
    Geng, Haijiang
    Lu, Tao
    Lin, Xiao
    Liu, Yu
    Yan, Fangrong
    [J]. BIOCHEMISTRY RESEARCH INTERNATIONAL, 2015, 2015
  • [2] Prediction of Protein Acetylation Sites using Kernel Naive Bayes Classifier Based on Protein Sequences Profiling
    Ahmed, Md. Shakil
    Shahjaman, Md.
    Kabir, Enamul
    Kamruzzaman, Md.
    [J]. BIOINFORMATION, 2018, 14 (05) : 213 - 218
  • [3] The Prediction of Protein-Protein Interaction Sites Based on RBF Classifier Improved by SMOTE
    Li, Hui
    Pi, Dechang
    Wang, Chishe
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [4] Identification of Interface Residues Involved in Protein-Protein Interactions Using Naive Bayes Classifier
    Wang, Chise
    Cheng, Jiaxing
    Su, Shoubao
    Xu, Dongzhe
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2008, 5139 : 207 - +
  • [5] Prediction of protein secondary structures with a novel kernel density estimation based classifier
    Chang D.T.-H.
    Ou Y.-Y.
    Hung H.-G.
    Yang M.-H.
    Chen C.-Y.
    Oyang Y.-J.
    [J]. BMC Research Notes, 1 (1)
  • [6] Identifying Protein-Protein Interaction Sites Using Adapted Bayesian Classifier
    Wang, Chishe
    Song, Jie
    Li, Fangping
    Lv, Junsong
    [J]. 2009 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL I, 2009, : 153 - +
  • [7] Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier
    Dhole, Kaustubh
    Singh, Gurdeep
    Pai, Priyadarshini P.
    Mondal, Sukanta
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2014, 348 : 47 - 54
  • [8] Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis
    Wang, Xue
    Zhang, Yaqun
    Yu, Bin
    Salhi, Adil
    Chen, Ruixin
    Wang, Lin
    Liu, Zengfeng
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 134
  • [9] In Silico Protein-Protein Interaction Prediction with Sequence Alignment and Classifier Stacking
    Marini, Simone
    Xu, Qian
    Yang, Qiang
    [J]. CURRENT PROTEIN & PEPTIDE SCIENCE, 2011, 12 (07) : 614 - 620
  • [10] Prediction of Protein Functions from Protein Interaction Networks: A Naive Bayes Approach
    Nguyen, Cao D.
    Gardiner, Katheleen J.
    Nguyen, Duong
    Cios, Krzysztof J.
    [J]. PRICAI 2008: TRENDS IN ARTIFICIAL INTELLIGENCE, 2008, 5351 : 788 - +