Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis

被引:39
|
作者
Wang, Xue [1 ,2 ]
Zhang, Yaqun [1 ,2 ]
Yu, Bin [1 ,2 ,3 ]
Salhi, Adil [4 ]
Chen, Ruixin [1 ,2 ]
Wang, Lin [1 ,2 ]
Liu, Zengfeng [1 ,2 ]
机构
[1] Qingdao Univ Sci & Technol, Coll Math & Phys, Qingdao 266061, Peoples R China
[2] Qingdao Univ Sci & Technol, Artificial Intelligence & Biomed Big Data Res Ctr, Qingdao 266061, Peoples R China
[3] Sci Computat Lab, Applicat Hainan Prov, Haikou 571158, Hainan, Peoples R China
[4] King Abdullah Univ Sci & Technol KAUST, Computat Bioscience Res Ctr CBRC, Thuwal 23955, Saudi Arabia
基金
中国国家自然科学基金;
关键词
Protein-protein interaction sites; Feature extraction; SMOTE; KPCA; XGBoost; SEQUENCE-BASED PREDICTION; SECONDARY STRUCTURE; CLASSIFIER; IDENTIFICATION; LOCALIZATION; DESCRIPTORS; IDENTIFY; NETWORKS; EEG;
D O I
10.1016/j.compbiomed.2021.104516
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Predicting protein-protein interaction sites (PPI sites) can provide important clues for understanding biological activity. Using machine learning to predict PPI sites can mitigate the cost of running expensive and timeconsuming biological experiments. Here we propose PPISP-XGBoost, a novel PPI sites prediction method based on eXtreme gradient boosting (XGBoost). First, the characteristic information of protein is extracted through the pseudo-position specific scoring matrix (PsePSSM), pseudo-amino acid composition (PseAAC), hydropathy index and solvent accessible surface area (ASA) under the sliding window. Next, these raw features are preprocessed to obtain more optimal representations in order to achieve better prediction. In particular, the synthetic minority oversampling technique (SMOTE) is used to circumvent class imbalance, and the kernel principal component analysis (KPCA) is applied to remove redundant characteristics. Finally, these optimal features are fed to the XGBoost classifier to identify PPI sites. Using PPISP-XGBoost, the prediction accuracy on the training dataset Dset186 reaches 85.4%, and the accuracy on the independent validation datasets Dtestset72, PDBtestset164, Dset_448 and Dset_355 reaches 85.3%, 83.9%, 85.8% and 85.4%, respectively, which all show an increase in accuracy against existing PPI sites prediction methods. These results demonstrate that the PPISPXGBoost method can further enhance the prediction of PPI sites.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Prediction of Protein-Protein Interaction with Pairwise Kernel Support Vector Machine
    Zhang, Shao-Wu
    Hao, Li-Yang
    Zhang, Ting-He
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2014, 15 (02): : 3220 - 3233
  • [22] Using Two-dimensional Principal Component Analysis and Rotation Forest for Prediction of Protein-Protein Interactions
    Lei Wang
    Zhu-Hong You
    Xin Yan
    Shi-Xiong Xia
    Feng Liu
    Li-Ping Li
    Wei Zhang
    Yong Zhou
    [J]. Scientific Reports, 8
  • [23] Using Two-dimensional Principal Component Analysis and Rotation Forest for Prediction of Protein-Protein Interactions
    Wang, Lei
    You, Zhu-Hong
    Yan, Xin
    Xia, Shi-Xiong
    Liu, Feng
    Li, Li-Ping
    Zhang, Wei
    Zhou, Yong
    [J]. SCIENTIFIC REPORTS, 2018, 8
  • [24] Prediction of Protein-Protein Interaction Sites Based on Naive Bayes Classifier
    Geng, Haijiang
    Lu, Tao
    Lin, Xiao
    Liu, Yu
    Yan, Fangrong
    [J]. BIOCHEMISTRY RESEARCH INTERNATIONAL, 2015, 2015
  • [25] Prediction of Protein-Protein Interaction Sites Based on Stratified Attentional Mechanisms
    Tang, Minli
    Wu, Longxin
    Yu, Xinyu
    Chu, Zhaoqi
    Jin, Shuting
    Liu, Juan
    [J]. FRONTIERS IN GENETICS, 2021, 12
  • [26] Prediction of Protein-Protein Interaction Sites Using Electrostatic Desolvation Profiles
    Fiorucci, Sebastien
    Zacharias, Martin
    [J]. BIOPHYSICAL JOURNAL, 2010, 98 (09) : 1921 - 1930
  • [27] A novel feature extraction scheme for prediction of protein-protein interaction sites
    Du, Xiuquan
    Jing, Anqi
    Hu, Xinying
    [J]. MOLECULAR BIOSYSTEMS, 2015, 11 (02) : 475 - 485
  • [28] Prediction of protein-protein interaction sites using support vector machines
    Minakuchi, Y
    Satou, K
    Konagaya, A
    [J]. METMBS'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES, 2003, : 22 - 28
  • [29] Prediction of protein-protein interaction sites using support vector machines
    Koike, A
    Takagi, T
    [J]. PROTEIN ENGINEERING DESIGN & SELECTION, 2004, 17 (02): : 165 - 173
  • [30] SXGBsite: Prediction of Protein-Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting
    Zhao, Ziqi
    Xu, Yonghong
    Zhao, Yong
    [J]. GENES, 2019, 10 (12)