Structure of the space of folding protein sequences defined by large language models

被引:0
|
作者
Zambon, A. [1 ,2 ]
Zecchina, R. [3 ]
Tiana, G. [1 ,2 ,4 ]
机构
[1] Univ Milan, Dept Phys, Via Celoria 16, I-20133 Milan, Italy
[2] Univ Milan, Ctr Complex & Biosyst, Via Celoria 16, I-20133 Milan, Italy
[3] Bocconi Univ, Via Roentgen 1, I-20136 Milan, Italy
[4] Sez Milano, INFN, Via Celoria 16, I-20133 Milan, Italy
关键词
energy landscape; protein evolution; canonical-ensemble sampling; machine learning; EVOLUTION;
D O I
10.1088/1478-3975/ad205c
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Proteins populate a manifold in the high-dimensional sequence space whose geometrical structure guides their natural evolution. Leveraging recently-developed structure prediction tools based on transformer models, we first examine the protein sequence landscape as defined by an effective energy that is a proxy of sequence foldability. This landscape shares characteristics with optimization challenges encountered in machine learning and constraint satisfaction problems. Our analysis reveals that natural proteins predominantly reside in wide, flat minima within this energy landscape. To investigate further, we employ statistical mechanics algorithms specifically designed to explore regions with high local entropy in relatively flat landscapes. Our findings indicate that these specialized algorithms can identify valleys with higher entropy compared to those found using traditional methods such as Monte Carlo Markov Chains. In a proof-of-concept case, we find that these highly entropic minima exhibit significant similarities to natural sequences, especially in critical key sites and local entropy. Additionally, evaluations through Molecular Dynamics suggests that the stability of these sequences closely resembles that of natural proteins. Our tool combines advancements in machine learning and statistical physics, providing new insights into the exploration of sequence landscapes where wide, flat minima coexist alongside a majority of narrower minima.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] MUTANT SEQUENCES AS PROBES OF PROTEIN FOLDING MECHANISMS
    MATTHEWS, CR
    HURLE, MR
    BIOESSAYS, 1987, 6 (06) : 254 - 257
  • [32] Mapping Distinct Sequences of Structure Formation Differentiating Multiple Folding Pathways of a Small Protein
    Bhatia, Sandhya
    Krishnamoorthy, Guruswamy
    Udgaonkar, Jayant B.
    JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2021, 143 (03) : 1447 - 1457
  • [33] Learning Models for Aligning Protein Sequences with Predicted Secondary Structure
    Kim, Eagu
    Wheeler, Travis
    Kececioglu, John
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS, 2009, 5541 : 512 - +
  • [34] Large language models design sequence-defined macromolecules via evolutionary optimization
    Reinhart, Wesley F.
    Statt, Antonia
    NPJ COMPUTATIONAL MATERIALS, 2024, 10 (01)
  • [35] NanoAbLLaMA: construction of nanobody libraries with protein large language models
    Wang, Xin
    Chen, Haotian
    Chen, Bo
    Liang, Lixin
    Mei, Fengcheng
    Huang, Bingding
    FRONTIERS IN CHEMISTRY, 2025, 13
  • [36] TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space
    Zhang, Shaolei
    Yu, Tian
    Feng, Yang
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8908 - 8949
  • [37] A Flexible State Space Model for Large Language Models: The GroupMamba Approach
    Liu, Xiling
    Ruan, Qunsheng
    Wu, Yingjia
    Chen, Kai
    Yang, Cheng-Fu
    SENSORS AND MATERIALS, 2024, 36 (10) : 4283 - 4295
  • [38] Sequence space, folding and protein design
    Cordes, MHJ
    Davidson, AR
    Sauer, RT
    CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (01) : 3 - 10
  • [39] Protein folding in contact map space
    Domany, E
    Najmanovich, R
    Vendruscolo, M
    WORKSHOP ON MONTE CARLO APPROACH TO BIOPOLYMERS AND PROTEIN FOLDING, 1998, : 194 - 210
  • [40] Protein folding in contact map space
    Vendruscolo, M
    Najmanovich, R
    Domany, E
    PHYSICAL REVIEW LETTERS, 1999, 82 (03) : 656 - 659