LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model

被引:7
|
作者
Pakhrin, Subash C. [1 ,2 ]
Pokharel, Suresh [3 ]
Aoki-Kinoshita, Kiyoko F. [4 ]
Beck, Moriah R. [5 ]
Dam, Tarun K. [6 ]
Caragea, Doina [7 ]
Kc, Dukka B. [3 ]
机构
[1] Wichita State Univ, Sch Comp, 1845 Fairmount St, Wichita, KS 67260 USA
[2] Univ Houston Downtown, Dept Comp Sci & Engn Technol, Houston, TX 77002 USA
[3] Michigan Technol Univ, Coll Comp, Dept Comp Sci, Houghton, MI 49931 USA
[4] Soka Univ, Glycan & Life Syst Integrat Ctr GaLSIC, Tokyo 1928577, Japan
[5] Wichita State Univ, Dept Chem & Biochem, 1845 Fairmount St, Wichita, KS 67260 USA
[6] Kansas State Univ, Dept Chem, Lab Mechanist Glycobiol, Manhattan, KS 66506 USA
[7] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA
基金
美国国家科学基金会;
关键词
deep learning; N-linked glycosylation; post-translation modification; prediction; protein language model; SEQUENCE; BACTERIAL; SETS;
D O I
10.1093/glycob/cwad033
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.
引用
收藏
页码:411 / 422
页数:12
相关论文
共 50 条
  • [1] On the Sentence Embeddings from Pre-trained Language Models
    Li, Bohan
    Zhou, Hao
    He, Junxian
    Wang, Mingxuan
    Yang, Yiming
    Li, Lei
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9119 - 9130
  • [2] SumoPred-PLM: human SUMOylation and SUMO2/3 sites Prediction using Pre-trained Protein Language Model
    Palacios, Andrew Vargas
    Acharya, Pujan
    Peidl, Anthony Stephen
    Beck, Moriah Rene
    Blanco, Eduardo
    Mishra, Avdesh
    Bawa-Khalfe, Tasneem
    Pakhrin, Subash Chandra
    NAR GENOMICS AND BIOINFORMATICS, 2024, 6 (01)
  • [3] Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning
    Liu, Yufan
    Tian, Boxue
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (01)
  • [4] Distilling Relation Embeddings from Pre-trained Language Models
    Ushio, Asahi
    Camacho-Collados, Jose
    Schockaert, Steven
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9044 - 9062
  • [5] Improving protein succinylation sites prediction using embeddings from protein language model
    Suresh Pokharel
    Pawel Pratyush
    Michael Heinzinger
    Robert H. Newman
    Dukka B. KC
    Scientific Reports, 12
  • [6] Improving protein succinylation sites prediction using embeddings from protein language model
    Pokharel, Suresh
    Pratyush, Pawel
    Heinzinger, Michael
    Newman, Robert H.
    Dukka, B. K. C.
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [7] Identification of N-linked glycosylation sites in human nephrin using mass spectrometry
    Khoshnoodi, Jamshid
    Hill, Salisha
    Tryggvason, Karl
    Hudson, Billy
    Friedman, David B.
    JOURNAL OF MASS SPECTROMETRY, 2007, 42 (03): : 370 - 379
  • [8] EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction
    Hou, Xiaoyang
    Wang, Yu
    Bu, Dongbo
    Wang, Yaojun
    Sun, Shiwei
    BIOINFORMATICS, 2023, 39 (11)
  • [9] Mapping human N-linked glycoproteins and glycosylation sites using mass spectrometry
    Dang, Liuyi
    Jia, Li
    Zhi, Yuan
    Li, Pengfei
    Zhao, Ting
    Zhu, Bojing
    Lan, Rongxia
    Hu, Yingwei
    Zhang, Hui
    Sun, Shisheng
    TRAC-TRENDS IN ANALYTICAL CHEMISTRY, 2019, 114 : 143 - 150
  • [10] Prediction of N-linked glycosylation sites using position relative features and statistical moments
    Akmal, Muhammad Aizaz
    Rasool, Nouman
    Khan, Yaser Daanial
    PLOS ONE, 2017, 12 (08):