DeepMPSF: A Deep Learning Network for Predicting General Protein Phosphorylation Sites Based on Multiple Protein Sequence Features

被引:2
|
作者
Xie, Jingxin [1 ]
Quan, Lijun [1 ,2 ,3 ]
Wang, Xuejiao [1 ]
Wu, Hongjie [4 ]
Jin, Zhi [1 ]
Pan, Deng [1 ]
Chen, Taoning [1 ]
Wu, Tingfang [1 ,2 ,3 ]
Lyu, Qiang [1 ,2 ,3 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
[2] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou 215006, Peoples R China
[3] Collaborat Innovat Ctr Novel Software Technol & In, Nanjing 210000, Peoples R China
[4] Suzhou Univ Sci & Technol, Suzhou 215006, Peoples R China
基金
中国国家自然科学基金;
关键词
SUBSTRATE; RESOURCE; SERVER;
D O I
10.1021/acs.jcim.3c00996
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Phosphorylation, as one of the most important post-translational modifications, plays a key role in various cellular physiological processes and disease occurrences. In recent years, computer technology has been gradually applied to the prediction of protein phosphorylation sites. However, most existing methods rely on simple protein sequence features that provide limited contextual information. To overcome this limitation, we propose DeepMPSF, a phosphorylation site prediction model based on multiple protein sequence features. There are two types of features: sequence semantic features, which comprise protein residue type information and relative position information within protein sequence, and protein background biophysical features, which include global semantic information containing more comprehensive protein background information obtained from pretrained models. To extract these features, DeepMPSF employs two separate subnetworks: the S71SFE module and the BBFE module, which automatically extract high-level semantic features. Our model incorporates a learning strategy for handling imbalanced datasets through ensemble learning during training and prediction. DeepMPSF is trained and evaluated on a well-established dataset of human proteins. Comparing the analysis with other benchmark methods reveals that DeepMPSF outperforms in predicting both S/T residues and Y residues. In particular, DeepMPSF showed excellent generalization performance in cross-species blind test performance, with an average improvement of 5.63%/5.72%, 22.28%/25.94%, 20.11%/17.49%, and 26.40%/28.33% for Mus musculus/Rattus norvegicus test sets in area under curves (AUCs) of ROC curve, AUC of the PR curve, F1-score, and MCC metrics, respectively. Furthermore, it also shows excellent performance in the latest updated case of natural proteins with functional phosphorylation sites. Through an ablation study and visual analysis, we uncover that the design of different feature modules significantly contributes to the accurate classification of DeepMPSF, which provides valuable insights for predicting phosphorylation sites and offers effective support for future downstream research.
引用
收藏
页码:7258 / 7271
页数:14
相关论文
共 50 条
  • [1] Predicting Protein Phosphorylation Sites Based on Deep Learning
    Long, Haixia
    Sun, Zhao
    Li, Manzhi
    Fu, Hai Yan
    Lin, Ming Cai
    CURRENT BIOINFORMATICS, 2020, 15 (04) : 300 - 308
  • [2] Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network
    Khalili, Elham
    Ramazi, Shahin
    Ghanati, Faezeh
    Kouchaki, Samaneh
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (02)
  • [3] Predicting protein sumoylation sites from sequence features
    Teng, Shaolei
    Luo, Hong
    Wang, Liangjiang
    AMINO ACIDS, 2012, 43 (01) : 447 - 455
  • [4] Predicting protein sumoylation sites from sequence features
    Shaolei Teng
    Hong Luo
    Liangjiang Wang
    Amino Acids, 2012, 43 : 447 - 455
  • [5] Predicting protein phosphorylation sites
    Rachel Brem
    Genome Biology, 1 (1)
  • [6] DeepPhos: prediction of protein phosphorylation sites with deep learning
    Luo, Fenglin
    Wang, Minghui
    Liu, Yu
    Zhao, Xing-Ming
    Li, Ao
    BIOINFORMATICS, 2019, 35 (16) : 2766 - 2773
  • [7] Predicting protein-protein interactions through sequence-based deep learning
    Hashemifar, Somaye
    Neyshabur, Behnam
    Khan, Aly A.
    Xu, Jinbo
    BIOINFORMATICS, 2018, 34 (17) : 802 - 810
  • [8] Predicting protein lysine phosphoglycerylation sites by hybridizing many sequence based features
    Chen, Qing-Yun
    Tang, Jijun
    Du, Pu-Feng
    MOLECULAR BIOSYSTEMS, 2017, 13 (05) : 874 - 882
  • [9] A Deep Learning and XGBoost-Based Method for Predicting Protein-Protein Interaction Sites
    Wang, Pan
    Zhang, Guiyang
    Yu, Zu-Guo
    Huang, Guohua
    FRONTIERS IN GENETICS, 2021, 12
  • [10] Sequence-based machine learning method for predicting the effects of phosphorylation on protein-protein interactions
    Hong, Xiaokun
    Lv, Jiyang
    Li, Zhengxin
    Xiong, Yi
    Zhang, Jian
    Chen, Hai-Feng
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2023, 243