ParaAntiProt provides paratope prediction using antibody and protein language models

被引:0
|
作者
Kalemati, Mahmood [1 ]
Noroozi, Alireza [1 ]
Shahbakhsh, Aref [1 ]
Koohi, Somayyeh [1 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Paratope prediction; Antibody Language models; Protein Language models; Complementarity determining regions; Deep learning;
D O I
10.1038/s41598-024-80940-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Efficiently predicting the paratope holds immense potential for enhancing antibody design, treating cancers and other serious diseases, and advancing personalized medicine. Although traditional methods are highly accurate, they are often time-consuming, labor-intensive, and reliant on 3D structures, restricting their broader use. On the other hand, machine learning-based methods, besides relying on structural data, entail descriptor computation, consideration of diverse physicochemical properties, and feature engineering. Here, we develop a deep learning-assisted prediction method for paratope identification, relying solely on amino acid sequences and being antigen-agnostic. Built on the ProtTrans architecture, and utilizing pre-trained protein and antibody language models, we extract efficient embeddings for predicting paratope. By incorporating positional encoding for Complementarity Determining Regions, our model gains a deeper structural understanding, achieving remarkable performance with a 0.904 ROC AUC, 0.701 F1-score, and 0.585 MCC on benchmark datasets. In addition to yielding accurate antibody paratope predictions, our method exhibits strong performance in predicting nanobody paratope, achieving a ROC AUC of 0.912 and a PR AUC of 0.665 on the nanobody dataset. Notably, our approach outperforms structure-based prediction methods, boasting a PR AUC of 0.731. Various conducted ablation studies, which elaborate on the impact of each part of the model on the prediction task, show that the improvement in prediction performance by applying CDR positional encoding together with CNNs depends on the specific protein and antibody language models used. These results highlight the potential of our method to advance disease understanding and aid in the discovery of new diagnostics and antibody therapies.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Local protein structure prediction using discriminative models
    Sander, O
    Sommer, I
    Lengauer, T
    BMC BIOINFORMATICS, 2006, 7 (1)
  • [42] Local protein structure prediction using discriminative models
    Oliver Sander
    Ingolf Sommer
    Thomas Lengauer
    BMC Bioinformatics, 7
  • [43] Prediction of Protein-Protein Interactions Using Vision Transformer and Language Model
    Jha, Kanchan
    Saha, Sriparna
    Karmakar, Sourav
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (05) : 3215 - 3225
  • [44] Prediction of antibiotic resistance mechanisms using a protein language model
    Yagimoto, Kanami
    Hosoda, Shion
    Sato, Miwa
    Hamada, Michiaki
    BIOINFORMATICS, 2024, 40 (10)
  • [45] Comparative Assessment of Protein Large Language Models for Enzyme Commission Number Prediction
    Capela, Joao
    Zimmermann-Kogadeeva, Maria
    van Dijk, Aalt D. J.
    de Ridder, Dick
    Dias, Oscar
    Rocha, Miguel
    BMC BIOINFORMATICS, 2025, 26 (01):
  • [46] Leveraging Sequential and Spatial Neighbors Information by Using CNNs Linked With GCNs for Paratope Prediction
    Lu, Shuai
    Li, Yuguang
    Wang, Fei
    Nan, Xiaofei
    Zhang, Shoutao
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (01) : 68 - 74
  • [47] Next word prediction for Urdu language using deep learning models
    Shahid, Ramish
    Wali, Aamir
    Bashir, Maryam
    COMPUTER SPEECH AND LANGUAGE, 2024, 87
  • [48] Prediction of tumor board procedural recommendations using large language models
    Aubreville, Marc
    Ganz, Jonathan
    Ammeling, Jonas
    Rosbach, Emely
    Gehrke, Thomas
    Scherzad, Agmal
    Hackenberg, Stephan
    Goncalves, Miguel
    EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2025, 282 (03) : 1619 - 1629
  • [49] Improving protein-protein interaction prediction using protein language model and protein network features
    Hu, Jun
    Li, Zhe
    Rao, Bing
    Thafar, Maha A.
    Arif, Muhammad
    ANALYTICAL BIOCHEMISTRY, 2024, 693
  • [50] Prediction of antibody response using recombinant human protein fragments as antigen
    Rockberg, Johan
    Uhlen, Mathias
    PROTEIN SCIENCE, 2009, 18 (11) : 2346 - 2355