Improving protein succinylation sites prediction using embeddings from protein language model

被引:21
|
作者
Pokharel, Suresh [1 ]
Pratyush, Pawel [1 ]
Heinzinger, Michael [2 ,3 ]
Newman, Robert H. [4 ,5 ]
Dukka, B. K. C. [1 ]
机构
[1] Michigan Technol Univ, Dept Comp Sci, Houghton, MI 49931 USA
[2] TUM Tech Univ Munich, Dept Informat Bioinformat & Computat Biol I12, Boltzmannstr 3, D-85748 Garching, Germany
[3] Ctr Doctoral Studies Informat & Its Applicat CeDo, TUM Grad Sch, Boltzmannstr 11, D-85748 Garching, Germany
[4] North Carolina A&T State Univ, Coll Sci & Technol, Dept Biol, Greensboro, NC USA
[5] Univ N Carolina, Dept Chem, Chapel Hill, NC 27515 USA
基金
美国国家科学基金会;
关键词
LYSINE SUCCINYLATION; UNIREF;
D O I
10.1038/s41598-022-21366-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Protein succinylation is an important post-translational modification (PTM) responsible for many vital metabolic activities in cells, including cellular respiration, regulation, and repair. Here, we present a novel approach that combines features from supervised word embedding with embedding from a protein language model called ProtT5-XL-UniRef50 (hereafter termed, ProtT5) in a deep learning framework to predict protein succinylation sites. To our knowledge, this is one of the first attempts to employ embedding from a pre-trained protein language model to predict protein succinylation sites. The proposed model, dubbed LMSuccSite, achieves state-of-the-art results compared to existing methods, with performance scores of 0.36, 0.79, 0.79 for MCC, sensitivity, and specificity, respectively. LMSuccSite is likely to serve as a valuable resource for exploration of succinylation and its role in cellular physiology and disease.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Improving protein succinylation sites prediction using embeddings from protein language model
    Suresh Pokharel
    Pawel Pratyush
    Michael Heinzinger
    Robert H. Newman
    Dukka B. KC
    Scientific Reports, 12
  • [2] Superior protein thermophilicity prediction with protein language model embeddings
    Haselbeck, Florian
    John, Maura
    Zhang, Yuqi
    Pirnay, Jonathan
    Fuenzalida-Werner, Juan Pablo
    Costa, Ruben D.
    Grimm, Dominik G.
    NAR GENOMICS AND BIOINFORMATICS, 2023, 5 (04)
  • [3] An analysis of protein language model embeddings for fold prediction
    Villegas-Morcillo, Amelia
    Gomez, Angel M.
    Sanchez, Victoria
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (03)
  • [4] Improving protein-protein interaction prediction using protein language model and protein network features
    Hu, Jun
    Li, Zhe
    Rao, Bing
    Thafar, Maha A.
    Arif, Muhammad
    ANALYTICAL BIOCHEMISTRY, 2024, 693
  • [5] LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model
    Pakhrin, Subash C.
    Pokharel, Suresh
    Aoki-Kinoshita, Kiyoko F.
    Beck, Moriah R.
    Dam, Tarun K.
    Caragea, Doina
    Kc, Dukka B.
    GLYCOBIOLOGY, 2023, 33 (05) : 411 - 422
  • [6] Classifying alkaliphilic proteins using embeddings from protein language model
    Susanty M.
    Naim Mursalim M.K.
    Hertadi R.
    Purwarianti A.
    Rajab T.L.
    Computers in Biology and Medicine, 2024, 173
  • [7] Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction
    Weissenow, Konstantin
    Heinzinger, Michael
    Rost, Burkhard
    STRUCTURE, 2022, 30 (08) : 1169 - +
  • [8] Detecting Succinylation sites from protein sequences using ensemble support vector machine
    Qiao Ning
    Xiaosa Zhao
    Lingling Bao
    Zhiqiang Ma
    Xiaowei Zhao
    BMC Bioinformatics, 19
  • [9] Detecting Succinylation sites from protein sequences using ensemble support vector machine
    Ning, Qiao
    Zhao, Xiaosa
    Bao, Lingling
    Ma, Zhiqiang
    Zhao, Xiaowei
    BMC BIOINFORMATICS, 2018, 19
  • [10] Prediction of Protein-DNA Binding Sites Based on Protein Language Model and Deep Learning
    Shan, Kaixuan
    Zhang, Xiankun
    Song, Chen
    ADVANCED INTELLIGENT COMPUTING IN BIOINFORMATICS, PT II, ICIC 2024, 2024, 14882 : 314 - 325