Using protein language models for protein interaction hot spot prediction with limited data

被引:3
|
作者
Sargsyan, Karen [1 ]
Lim, Carmay [1 ]
机构
[1] Acad Sinica, Inst Biomed Sci, Taipei 115, Taiwan
关键词
Protein language models; ESM-2; Protein-protein interaction; PPI-hotspot; Small datasets; Feature selection; BINDING; CONSURF;
D O I
10.1186/s12859-024-05737-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundProtein language models, inspired by the success of large language models in deciphering human language, have emerged as powerful tools for unraveling the intricate code of life inscribed within protein sequences. They have gained significant attention for their promising applications across various areas, including the sequence-based prediction of secondary and tertiary protein structure, the discovery of new functional protein sequences/folds, and the assessment of mutational impact on protein fitness. However, their utility in learning to predict protein residue properties based on scant datasets, such as protein-protein interaction (PPI)-hotspots whose mutations significantly impair PPIs, remained unclear. Here, we explore the feasibility of using protein language-learned representations as features for machine learning to predict PPI-hotspots using a dataset containing 414 experimentally confirmed PPI-hotspots and 504 PPI-nonhot spots.ResultsOur findings showcase the capacity of unsupervised learning with protein language models in capturing critical functional attributes of protein residues derived from the evolutionary information encoded within amino acid sequences. We show that methods relying on protein language models can compete with methods employing sequence and structure-based features to predict PPI-hotspots from the free protein structure. We observed an optimal number of features for model precision, suggesting a balance between information and overfitting.ConclusionsThis study underscores the potential of transformer-based protein language models to extract critical knowledge from sparse datasets, exemplified here by the challenging realm of predicting PPI-hotspots. These models offer a cost-effective and time-efficient alternative to traditional experimental methods for predicting certain residue properties. However, the challenge of explaining why specific features are important for determining certain residue properties remains.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Using protein language models for protein interaction hot spot prediction with limited data
    Karen Sargsyan
    Carmay Lim
    BMC Bioinformatics, 25
  • [2] Prediction of protein functions using protein interaction data
    Jung, H
    Han, K
    COMPUTATIONAL SCIENCE - ICCS 2004, PT 2, PROCEEDINGS, 2004, 3037 : 317 - 324
  • [3] Densest subgraph-based methods for protein-protein interaction hot spot prediction
    Li, Ruiming
    Lee, Jung-Yu
    Yang, Jinn-Moon
    Akutsu, Tatsuya
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [4] Densest subgraph-based methods for protein-protein interaction hot spot prediction
    Ruiming Li
    Jung-Yu Lee
    Jinn-Moon Yang
    Tatsuya Akutsu
    BMC Bioinformatics, 23
  • [5] Prediction of protein function using protein-protein interaction data
    Deng, MH
    Zhang, K
    Mehta, S
    Chen, T
    Sun, FZ
    CSB2002: IEEE COMPUTER SOCIETY BIOINFORMATICS CONFERENCE, 2002, : 197 - 206
  • [6] Prediction of protein function using protein-protein interaction data
    Deng, MH
    Zhang, K
    Mehta, S
    Chen, T
    Sun, FZ
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (06) : 947 - 960
  • [7] Computational Prediction of Protein Hot Spot Residues
    Morrow, John Kenneth
    Zhang, Shuxing
    CURRENT PHARMACEUTICAL DESIGN, 2012, 18 (09) : 1255 - 1265
  • [8] Machine Learning Approaches for Protein-Protein Interaction Hot Spot Prediction: Progress and Comparative Assessment
    Liu, Siyu
    Liu, Chuyao
    Deng, Lei
    MOLECULES, 2018, 23 (10):
  • [9] Hot spot prediction in protein-protein interactions by an ensemble system
    Liu, Quanya
    Chen, Peng
    Wang, Bing
    Zhang, Jun
    Li, Jinyan
    BMC SYSTEMS BIOLOGY, 2018, 12
  • [10] Pharmacological targeting of a protein-protein interaction hot spot on Gβγ
    Bonacci, TM
    Font, J
    Thompson, JL
    Malik, S
    Shuttleworth, TJ
    Smrcka, AV
    FASEB JOURNAL, 2005, 19 (04): : A519 - A520