ParaAntiProt provides paratope prediction using antibody and protein language models

被引:0
|
作者
Kalemati, Mahmood [1 ]
Noroozi, Alireza [1 ]
Shahbakhsh, Aref [1 ]
Koohi, Somayyeh [1 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Paratope prediction; Antibody Language models; Protein Language models; Complementarity determining regions; Deep learning;
D O I
10.1038/s41598-024-80940-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Efficiently predicting the paratope holds immense potential for enhancing antibody design, treating cancers and other serious diseases, and advancing personalized medicine. Although traditional methods are highly accurate, they are often time-consuming, labor-intensive, and reliant on 3D structures, restricting their broader use. On the other hand, machine learning-based methods, besides relying on structural data, entail descriptor computation, consideration of diverse physicochemical properties, and feature engineering. Here, we develop a deep learning-assisted prediction method for paratope identification, relying solely on amino acid sequences and being antigen-agnostic. Built on the ProtTrans architecture, and utilizing pre-trained protein and antibody language models, we extract efficient embeddings for predicting paratope. By incorporating positional encoding for Complementarity Determining Regions, our model gains a deeper structural understanding, achieving remarkable performance with a 0.904 ROC AUC, 0.701 F1-score, and 0.585 MCC on benchmark datasets. In addition to yielding accurate antibody paratope predictions, our method exhibits strong performance in predicting nanobody paratope, achieving a ROC AUC of 0.912 and a PR AUC of 0.665 on the nanobody dataset. Notably, our approach outperforms structure-based prediction methods, boasting a PR AUC of 0.731. Various conducted ablation studies, which elaborate on the impact of each part of the model on the prediction task, show that the improvement in prediction performance by applying CDR positional encoding together with CNNs depends on the specific protein and antibody language models used. These results highlight the potential of our method to advance disease understanding and aid in the discovery of new diagnostics and antibody therapies.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Redefining antibody patent protection using paratope mapping and CDR-scanning
    Banik, Soma S. R.
    Deng, Xiaoxiang
    Davidson, Edgar
    Storz, Ulrich
    Doranz, Benjamin J.
    NATURE BIOTECHNOLOGY, 2025, 43 (02) : 170 - 174
  • [32] BepiPred-3.0: Improved B-cell epitope prediction using protein language models
    Clifford, Joakim Noddeskov
    Hoie, Magnus Haraldson
    Deleuran, Sebastian
    Peters, Bjoern
    Nielsen, Morten
    Marcatili, Paolo
    PROTEIN SCIENCE, 2022, 31 (12)
  • [33] Accurate prediction of antibody function and structure using bio-inspired antibody language model
    Jing, Hongtai
    Gao, Zhengtao
    Xu, Sheng
    Shen, Tao
    Peng, Zhangzhi
    He, Shwai
    You, Tao
    Ye, Shuang
    Lin, Wei
    Sun, Siqi
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (04)
  • [34] Clinical risk prediction using language models: benefits and considerations
    Acharya, Angeela
    Shrestha, Sulabh
    Chen, Anyi
    Conte, Joseph
    Avramovic, Sanja
    Sikdar, Siddhartha
    Anastasopoulos, Antonios
    Das, Sanmay
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024,
  • [35] Prediction of Arabic Legal Rulings Using Large Language Models
    Ammar, Adel
    Koubaa, Anis
    Benjdira, Bilel
    Nacar, Omer
    Sibaee, Serry
    ELECTRONICS, 2024, 13 (04)
  • [36] University Student Dropout Prediction Using Pretrained Language Models
    Won, Hyun-Sik
    Kim, Min-Ji
    Kim, Dohyun
    Kim, Hee-Soo
    Kim, Kang-Min
    APPLIED SCIENCES-BASEL, 2023, 13 (12):
  • [37] Linguistics-based formalization of the antibody language as a basis for antibody language models
    Vu, Mai Ha
    Robert, Philippe A.
    Akbar, Rahmad
    Swiatczak, Bartlomiej
    Sandve, Geir Kjetil
    Haug, Dag Trygve Truslew
    Greiff, Victor
    NATURE COMPUTATIONAL SCIENCE, 2024, 4 (06): : 412 - 422
  • [38] Evaluating the advancements in protein language models for encoding strategies in protein function prediction: a comprehensive review
    Chen, Jia-Ying
    Wang, Jing-Fu
    Hu, Yue
    Li, Xin-Hui
    Qian, Yu-Rong
    Song, Chao-Lin
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2025, 13
  • [39] Accurate and Fast Prediction of Intrinsically Disordered Protein by Multiple Protein Language Models and Ensemble Learning
    Xu, Shijie
    Onoda, Akira
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 64 (07) : 2901 - 2911
  • [40] Accurate and Fast Prediction of Intrinsically Disordered Protein by Multiple Protein Language Models and Ensemble Learning
    Xu, Shijie
    Onoda, Akira
    Journal of Chemical Information and Modeling, 2024, 64 (07) : 2901 - 2911