Protein Language Models and Machine Learning Facilitate the Identification of Antimicrobial Peptides

被引:2
|
作者
Medina-Ortiz, David [1 ,2 ]
Contreras, Seba [3 ]
Fernandez, Diego [1 ]
Soto-Garcia, Nicole [1 ]
Moya, Ivan [1 ,4 ]
Cabas-Mora, Gabriel [1 ]
Olivera-Nappa, Alvaro [2 ,5 ]
机构
[1] Univ Magallanes, Dept Ingn Comp, Punta Arenas 6210005, Chile
[2] Univ Chile, Ctr Biotechnol & Bioengn, CeBiB, Santiago 8370456, Chile
[3] Max Planck Inst Dynam & Self Org, Fassberg 17, D-37077 Gottingen, Germany
[4] Univ Magallanes, Dept Ingn Quim, Punta Arenas 6210005, Chile
[5] Univ Chile, Dept Ingn Quim Biotecnol & Mat, Santiago 8370456, Chile
关键词
antimicrobial peptides; machine learning; protein language models; generative learning; peptide discovery; peptide design; PREDICTION; CLASSIFICATION; DESIGN;
D O I
10.3390/ijms25168851
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Peptides are bioactive molecules whose functional versatility in living organisms has led to successful applications in diverse fields. In recent years, the amount of data describing peptide sequences and function collected in open repositories has substantially increased, allowing the application of more complex computational models to study the relations between the peptide composition and function. This work introduces AMP-Detector, a sequence-based classification model for the detection of peptides' functional biological activity, focusing on accelerating the discovery and de novo design of potential antimicrobial peptides (AMPs). AMP-Detector introduces a novel sequence-based pipeline to train binary classification models, integrating protein language models and machine learning algorithms. This pipeline produced 21 models targeting antimicrobial, antiviral, and antibacterial activity, achieving average precision exceeding 83%. Benchmark analyses revealed that our models outperformed existing methods for AMPs and delivered comparable results for other biological activity types. Utilizing the Peptide Atlas, we applied AMP-Detector to discover over 190,000 potential AMPs and demonstrated that it is an integrative approach with generative learning to aid in de novo design, resulting in over 500 novel AMPs. The combination of our methodology, robust models, and a generative design strategy offers a significant advancement in peptide-based drug discovery and represents a pivotal tool for therapeutic applications.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Identification of Novel Antibacterial Peptides by Chemoinformatics and Machine Learning
    Fjell, Christopher D.
    Jenssen, Havard
    Hilpert, Kai
    Cheung, Warren A.
    Pante, Nelly
    Hancock, Robert E. W.
    Cherkasov, Artem
    JOURNAL OF MEDICINAL CHEMISTRY, 2009, 52 (07) : 2006 - 2015
  • [22] Prediction of the synergistic effect of antimicrobial peptides and antimicrobial agents via supervised machine learning
    Basak Olcay
    Gizem D. Ozdemir
    Mehmet A. Ozdemir
    Utku K. Ercan
    Onan Guren
    Ozan Karaman
    BMC Biomedical Engineering, 6 (1):
  • [23] Identification of potent antimicrobial peptides via a machine-learning pipeline that mines the entire space of peptide sequences
    Junjie Huang
    Yanchao Xu
    Yunfan Xue
    Yue Huang
    Xu Li
    Xiaohui Chen
    Yao Xu
    Dongxiang Zhang
    Peng Zhang
    Junbo Zhao
    Jian Ji
    Nature Biomedical Engineering, 2023, 7 : 797 - 810
  • [24] Identification of potent antimicrobial peptides via a machine-learning pipeline that mines the entire space of peptide sequences
    Huang, Junjie
    Xu, Yanchao
    Xue, Yunfan
    Huang, Yue
    Li, Xu
    Chen, Xiaohui
    Xu, Yao
    Zhang, Dongxiang
    Zhang, Peng
    Zhao, Junbo
    Ji, Jian
    NATURE BIOMEDICAL ENGINEERING, 2023, 7 (06) : 797 - +
  • [25] Machine Learning Models in Protein Bioinformatics
    Kurgan, Lukasz
    Zhou, Yaoqi
    CURRENT PROTEIN & PEPTIDE SCIENCE, 2011, 12 (06) : 455 - 455
  • [26] Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides
    Tran, Thi Thanh Nha
    Tran, Thi Dieu Thuan
    Bui, Thi Thu Thuy
    RSC ADVANCES, 2023, 13 (48) : 33707 - 33720
  • [27] Deductive machine learning models for product identification
    Jin, Tianfan
    Zhao, Qiyuan
    Schofield, Andrew B.
    Savoie, Brett M.
    CHEMICAL SCIENCE, 2024, 15 (30) : 11995 - 12005
  • [28] Metrics for Machine Learning Models to Facilitate SOTIF Analysis in Autonomous Vehicles
    Madala K.
    Avalos Gonzalez C.
    SAE International Journal of Advances and Current Practices in Mobility, 2023, 6 (02): : 782 - 790
  • [29] UniproLcad: Accurate Identification of Antimicrobial Peptide by Fusing Multiple Pre-Trained Protein Language Models
    Wang, Xiao
    Wu, Zhou
    Wang, Rong
    Gao, Xu
    SYMMETRY-BASEL, 2024, 16 (04):
  • [30] Web based machine learning for language identification and translation
    Sagiroglu, Seref
    Yavanoglu, Uraz
    Guven, Esra Nergis
    ICMLA 2007: SIXTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2007, : 280 - 285