LMPhosSite: A Deep Learning-Based Approach for General Protein Phosphorylation Site Prediction Using Embeddings from the Local Window Sequence and Pretrained Protein Language Model

被引:5
|
作者
Pakhrin, Subash C. [1 ,2 ]
Pokharel, Suresh [3 ]
Pratyush, Pawel [3 ]
Chaudhari, Meenal [4 ]
Ismail, Hamid D. [3 ]
Dukka, B. K. C. B. [3 ]
机构
[1] Wichita State Univ, Sch Comp, Wichita, KS 67260 USA
[2] Univ Houston Downtown, Dept Comp Sci & Engn Technol, Houston, TX 77002 USA
[3] Michigan Technol Univ, Dept Comp Sci, Houghton, MI 49931 USA
[4] North Carolina A&T State Univ, Dept Biol, Greensboro, NC 27411 USA
基金
美国国家科学基金会;
关键词
post-translational modification; protein language model; phosphorylation; deep learning; stack generalization; score-level fusion; embedding; RESOURCE; ASSOCIATION; DATABASE;
D O I
10.1021/acs.jproteome.2c00667
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Phosphorylation is one of the most important post-translationalmodifications and plays a pivotal role in various cellular processes.Although there exist several computational tools to predict phosphorylationsites, existing tools have not yet harnessed the knowledge distilledby pretrained protein language models. Herein, we present a noveldeep learning-based approach called LMPhosSite for the general phosphorylationsite prediction that integrates embeddings from the local window sequenceand the contextualized embedding obtained using global (overall) proteinsequence from a pretrained protein language model to improve the predictionperformance. Thus, the LMPhosSite consists of two base-models: onefor capturing effective local representation and the other for capturingglobal per-residue contextualized embedding from a pretrained proteinlanguage model. The output of these base-models is integrated usinga score-level fusion approach. LMPhosSite achieves a precision, recall,Matthew's correlation coefficient, and F1-score of 38.78%, 67.12%,0.390, and 49.15%, for the combined serine and threonine independenttest data set and 34.90%, 62.03%, 0.298, and 44.67%, respectively,for the tyrosine independent test data set, which is better than thecompared approaches. These results demonstrate that LMPhosSite isa robust computational tool for the prediction of the general phosphorylationsites in proteins.
引用
收藏
页码:2548 / 2557
页数:10
相关论文
共 50 条
  • [1] THPLM: a sequence-based deep learning framework for protein stability changes prediction upon point variations using pretrained protein language model
    Gong, Jianting
    Jiang, Lili
    Chen, Yongbing
    Zhang, Yixiang
    Li, Xue
    Ma, Zhiqiang
    Fu, Zhiguo
    He, Fei
    Sun, Pingping
    Ren, Zilin
    Tian, Mingyao
    BIOINFORMATICS, 2023, 39 (11)
  • [2] Single-sequence protein structure prediction using a language model and deep learning
    Ratul Chowdhury
    Nazim Bouatta
    Surojit Biswas
    Christina Floristean
    Anant Kharkar
    Koushik Roy
    Charlotte Rochereau
    Gustaf Ahdritz
    Joanna Zhang
    George M. Church
    Peter K. Sorger
    Mohammed AlQuraishi
    Nature Biotechnology, 2022, 40 : 1617 - 1623
  • [3] Single-sequence protein structure prediction using a language model and deep learning
    Chowdhury, Ratul
    Bouatta, Nazim
    Biswas, Surojit
    Floristean, Christina
    Kharkare, Anant
    Roye, Koushik
    Rochereau, Charlotte
    Ahdritz, Gustaf
    Zhang, Joanna
    Church, George M.
    Sorger, Peter K.
    AlQuraishi, Mohammed
    NATURE BIOTECHNOLOGY, 2022, 40 (11) : 1617 - +
  • [4] Improving protein succinylation sites prediction using embeddings from protein language model
    Suresh Pokharel
    Pawel Pratyush
    Michael Heinzinger
    Robert H. Newman
    Dukka B. KC
    Scientific Reports, 12
  • [5] Improving protein succinylation sites prediction using embeddings from protein language model
    Pokharel, Suresh
    Pratyush, Pawel
    Heinzinger, Michael
    Newman, Robert H.
    Dukka, B. K. C.
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [6] DeepPPSite: A deep learning-based model for analysis and prediction of phosphorylation sites using efficient sequence information
    Ahmed, Saeed
    Kabir, Muhammad
    Arif, Muhammad
    Khan, Zaheer Ullah
    Yu, Dong-Jun
    ANALYTICAL BIOCHEMISTRY, 2021, 612
  • [7] Robust deep learning-based protein sequence design using ProteinMPNN
    Dauparas, J.
    Anishchenko, I.
    Bennett, N.
    Bai, H.
    Ragotte, R. J.
    Milles, L. F.
    Wicky, B. I. M.
    Courbet, A.
    de Haas, R. J.
    Bethel, N.
    Leung, P. J. Y.
    Huddy, T. F.
    Pellock, S.
    Tischer, D.
    Chan, F.
    Koepnick, B.
    Nguyen, H.
    Kang, A.
    Sankaran, B.
    Bera, A. K.
    King, N. P.
    Baker, D.
    SCIENCE, 2022, 378 (6615) : 49 - 55
  • [8] DeepSuccinylSite: a deep learning based approach for protein succinylation site prediction
    Thapa, Niraj
    Chaudhari, Meenal
    McManus, Sean
    Roy, Kaushik
    Newman, Robert H.
    Saigo, Hiroto
    KC, Dukka B.
    BMC BIOINFORMATICS, 2020, 21 (Suppl 3)
  • [9] DeepSuccinylSite: a deep learning based approach for protein succinylation site prediction
    Niraj Thapa
    Meenal Chaudhari
    Sean McManus
    Kaushik Roy
    Robert H. Newman
    Hiroto Saigo
    Dukka B. KC
    BMC Bioinformatics, 21
  • [10] DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model
    Fang, Yitian
    Jiang, Yi
    Wei, Leyi
    Ma, Qin
    Ren, Zhixiang
    Yuan, Qianmu
    Wei, Dong-Qing
    BIOINFORMATICS, 2023, 39 (12)