LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model

被引:7
|
作者
Pakhrin, Subash C. [1 ,2 ]
Pokharel, Suresh [3 ]
Aoki-Kinoshita, Kiyoko F. [4 ]
Beck, Moriah R. [5 ]
Dam, Tarun K. [6 ]
Caragea, Doina [7 ]
Kc, Dukka B. [3 ]
机构
[1] Wichita State Univ, Sch Comp, 1845 Fairmount St, Wichita, KS 67260 USA
[2] Univ Houston Downtown, Dept Comp Sci & Engn Technol, Houston, TX 77002 USA
[3] Michigan Technol Univ, Coll Comp, Dept Comp Sci, Houghton, MI 49931 USA
[4] Soka Univ, Glycan & Life Syst Integrat Ctr GaLSIC, Tokyo 1928577, Japan
[5] Wichita State Univ, Dept Chem & Biochem, 1845 Fairmount St, Wichita, KS 67260 USA
[6] Kansas State Univ, Dept Chem, Lab Mechanist Glycobiol, Manhattan, KS 66506 USA
[7] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA
基金
美国国家科学基金会;
关键词
deep learning; N-linked glycosylation; post-translation modification; prediction; protein language model; SEQUENCE; BACTERIAL; SETS;
D O I
10.1093/glycob/cwad033
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.
引用
收藏
页码:411 / 422
页数:12
相关论文
共 50 条
  • [31] Analysis of Binding Sites on Complement Factor I Using Artificial N-Linked Glycosylation
    Sanchez-Gallego, Jose I.
    Groeneveld, Tom W. L.
    Krentz, Stefanie
    Nilsson, Sara C.
    Villoutreix, Bruno O.
    Blom, Anna M.
    JOURNAL OF BIOLOGICAL CHEMISTRY, 2012, 287 (17) : 13572 - 13583
  • [32] Interpretable Prediction of SARS-CoV-2 Epitope-Specific TCR Recognition Using a Pre-Trained Protein Language Model
    Yoo, Sunyong
    Jeong, Myeonghyeon
    Seomun, Subhin
    Kim, Kiseong
    Han, Youngmahn
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2024, 21 (03) : 428 - 438
  • [33] Comprehensive Profiling of N-Linked Glycosylation Sites in HeLa Cells Using Hydrazide Enrichment
    Malerod, Helle
    Graham, Robert L. J.
    Sweredoski, Michael J.
    Hess, Sonja
    JOURNAL OF PROTEOME RESEARCH, 2013, 12 (01) : 248 - 259
  • [34] DaDL-SChlo: protein subchloroplast localization prediction based on generative adversarial networks and pre-trained protein language model
    Wang, Xiao
    Han, Lijun
    Wang, Rong
    Chen, Haoran
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (03)
  • [35] CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model
    Jung, Tae-Hwan
    NLP4PROG 2021: THE 1ST WORKSHOP ON NATURAL LANGUAGE PROCESSING FOR PROGRAMMING (NLP4PROG 2021), 2021, : 26 - 33
  • [36] CommitBERT: Commit message generation using pre-trained programming language model
    Jung, Tae-Hwan
    arXiv, 2021,
  • [37] Incorporating a transfer learning technique with amino acid embeddings to efficiently predict N-linked glycosylation sites in ion channels
    Nguyen, Trinh-Trung-Duong
    Le, Nguyen-Quoc-Khanh
    Tran, The-Anh
    Pham, Dinh-Minh
    Ou, Yu-Yen
    COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 130
  • [38] N-linked glycosylation sites of the motor protein prestin:: effects on membrane targeting and electrophysiological function
    Matsuda, K
    Zheng, J
    Du, GG
    Klöcker, N
    Madison, LD
    Dallos, P
    JOURNAL OF NEUROCHEMISTRY, 2004, 89 (04) : 928 - 938
  • [39] Classifying informative tweets using feature enhanced pre-trained language model
    Yandrapati, Prakash Babu
    Eswari, R.
    SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
  • [40] Using Pre-trained Language Model to Enhance Active Learning for Sentence Matching
    Bai, Guirong
    He, Shizhu
    Liu, Kang
    Zhao, Jun
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)