LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model

被引:7
|
作者
Pakhrin, Subash C. [1 ,2 ]
Pokharel, Suresh [3 ]
Aoki-Kinoshita, Kiyoko F. [4 ]
Beck, Moriah R. [5 ]
Dam, Tarun K. [6 ]
Caragea, Doina [7 ]
Kc, Dukka B. [3 ]
机构
[1] Wichita State Univ, Sch Comp, 1845 Fairmount St, Wichita, KS 67260 USA
[2] Univ Houston Downtown, Dept Comp Sci & Engn Technol, Houston, TX 77002 USA
[3] Michigan Technol Univ, Coll Comp, Dept Comp Sci, Houghton, MI 49931 USA
[4] Soka Univ, Glycan & Life Syst Integrat Ctr GaLSIC, Tokyo 1928577, Japan
[5] Wichita State Univ, Dept Chem & Biochem, 1845 Fairmount St, Wichita, KS 67260 USA
[6] Kansas State Univ, Dept Chem, Lab Mechanist Glycobiol, Manhattan, KS 66506 USA
[7] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA
基金
美国国家科学基金会;
关键词
deep learning; N-linked glycosylation; post-translation modification; prediction; protein language model; SEQUENCE; BACTERIAL; SETS;
D O I
10.1093/glycob/cwad033
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.
引用
收藏
页码:411 / 422
页数:12
相关论文
共 50 条
  • [41] A pre-trained language model for emergency department intervention prediction using routine physiological data and clinical narratives
    Huang, Ting-Yun
    Chong, Chee-Fah
    Lin, Heng-Yu
    Chen, Tzu-Ying
    Chang, Yung-Chun
    Lin, Ming-Chin
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2024, 191
  • [42] POOE: predicting oomycete effectors based on a pre-trained large protein language model
    Zhao, Miao
    Lei, Chenping
    Zhou, Kewei
    Huang, Yan
    Fu, Chen
    Yang, Shiping
    Zhang, Ziding
    MSYSTEMS, 2024, 9 (01)
  • [43] Generation of Free Oligosaccharides from Bacterial Protein N-Linked Glycosylation Systems
    Dwivedi, Ritika
    Nothaft, Harald
    Reiz, Bela
    Whittal, Randy M.
    Szymanski, Christine M.
    BIOPOLYMERS, 2013, 99 (10) : 772 - 783
  • [44] Alzheimer Disease Recognition Using Speech-Based Embeddings From Pre-Trained Models
    Gauder, Lara
    Pepino, Leonardo
    Ferrer, Luciana
    Riera, Pablo
    INTERSPEECH 2021, 2021, : 3795 - 3799
  • [45] Identifying the sites and roles of N-linked glycosylation of the human type la metabotropic glutamate receptor
    Atkinson, PJ
    Selkirk, JV
    Price, GW
    Nahorski, SR
    Challiss, RAJ
    NEUROPHARMACOLOGY, 2002, 43 (02) : 274 - 274
  • [46] Role of individual N-linked glycosylation sites in the function and intracellular transport of the human α folate receptor
    Roberts, SJ
    Petropavlovskaja, M
    Chung, KN
    Knight, CB
    Elwood, PC
    ARCHIVES OF BIOCHEMISTRY AND BIOPHYSICS, 1998, 351 (02) : 227 - 235
  • [47] GENERATING HUMAN READABLE TRANSCRIPT FOR AUTOMATIC SPEECH RECOGNITION WITH PRE-TRAINED LANGUAGE MODEL
    Liao, Junwei
    Shi, Yu
    Gong, Ming
    Shou, Linjun
    Eskimez, Sefik
    Lu, Liyang
    Qu, Hong
    Zeng, Michael
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7578 - 7582
  • [48] Structure-based Comparative Analysis and Prediction of N-linked Glycosylation Sites in Evolutionarily Distant Eukaryotes
    Phuc Vinh Nguyen Lam
    Radoslav Goldman
    Konstantinos Karagiannis
    Tejas Narsule
    Vahan Simonyan
    Valerii Soika
    Raja Mazumder
    Genomics,Proteomics & Bioinformatics, 2013, (02) : 96 - 104
  • [49] Mobile GUI test script generation from natural language descriptions using pre-trained model
    Li, Chun
    9TH IEEE/ACM INTERNATIONAL CONFERENCE ON MOBILE SOFTWARE ENGINEERING AND SYSTEMS, MOBILESOFT 2022, 2022, : 112 - 113
  • [50] Using ensemble SVM to identify human GPCRs N-linked glycosylation sites based on the general form of Chous PseAAC
    Xie, Hua-Lin
    Fu, Liang
    Nie, Xi-Du
    PROTEIN ENGINEERING DESIGN & SELECTION, 2013, 26 (11): : 735 - 742