Unsupervised evolution of protein and antibody complexes with a structure-informed language model

被引:11
|
作者
Shanker, Varun R. [1 ,2 ,3 ]
Bruun, Theodora U. J. [2 ,3 ,4 ]
Hie, Brian L. [3 ,4 ,6 ,7 ,8 ]
Kim, Peter S. [3 ,4 ,5 ]
机构
[1] Stanford Univ, Sch Med, Stanford Biophys Program, Stanford, CA 94305 USA
[2] Stanford Univ, Sch Med, Stanford Med Scientist Training Program, Stanford, CA 94305 USA
[3] Stanford Univ, Sarafan ChEM H, Stanford, CA 94305 USA
[4] Stanford Univ, Sch Med, Dept Biochem, Stanford, CA 94305 USA
[5] Chan Zuckerberg Biohub, San Francisco, CA 94158 USA
[6] Stanford Univ, Dept Chem Engn, Stanford, CA 94305 USA
[7] Stanford Univ, Stanford Data Sci, Stanford, CA 94305 USA
[8] Arc Inst, Palo Alto, CA 94304 USA
关键词
FITNESS LANDSCAPES; SEQUENCE; DESIGN; SELECTION; RECOGNITION; INHIBITION; GENERATION; REVEALS; SET;
D O I
10.1126/science.adk8946
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here, we show that a general protein language model augmented with protein structure backbone coordinates can guide evolution for diverse proteins without the need to model individual functional tasks. We also demonstrate that ESM-IF1, which was only trained on single-chain structures, can be extended to engineer protein complexes. Using this approach, we screened about 30 variants of two therapeutic clinical antibodies used to treat severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. We achieved up to 25-fold improvement in neutralization and 37-fold improvement in affinity against antibody-escaped viral variants of concern BQ.1.1 and XBB.1.5, respectively. These findings highlight the advantage of integrating structural information to identify efficient protein evolution trajectories without requiring any task-specific training data.
引用
收藏
页码:46 / 53
页数:8
相关论文
共 50 条
  • [21] Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction
    Zhu, Yi-Heng
    Zhang, Chengxin
    Yu, Dong-Jun
    Zhang, Yang
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (12)
  • [22] Conserved sequence and structure association motifs in antibody-protein and antibody-hapten complexes
    Livesay, DR
    Subramaniam, S
    PROTEIN ENGINEERING DESIGN & SELECTION, 2004, 17 (05): : 463 - 472
  • [23] Transformer Encoder with Protein Language Model for Protein Secondary Structure Prediction
    Kazm, Ammar
    Ali, Aida
    Hashim, Haslina
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2024, 14 (02) : 13124 - 13132
  • [24] Accurate prediction of antibody function and structure using bio-inspired antibody language model
    Jing, Hongtai
    Gao, Zhengtao
    Xu, Sheng
    Shen, Tao
    Peng, Zhangzhi
    He, Shwai
    You, Tao
    Ye, Shuang
    Lin, Wei
    Sun, Siqi
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (04)
  • [25] Homologues not needed: Structure prediction from a protein language model
    Ben-Tal, Nir
    Kolodny, Rachel
    STRUCTURE, 2022, 30 (08) : 1047 - 1049
  • [26] Exploring social structure effect on language evolution based on a computational model
    Gong, Tao
    Minett, James W.
    Wang, William S. -Y.
    CONNECTION SCIENCE, 2008, 20 (2-3) : 135 - 153
  • [27] Differential Evolution for Protein Structure Prediction Using the HP Model
    Santos, J.
    Dieguez, M.
    FOUNDATIONS ON NATURAL AND ARTIFICIAL COMPUTATION: 4TH INTERNATIONAL WORK-CONFERENCE ON THE INTERPLAY BETWEEN NATURAL AND ARTIFICIAL COMPUTATION, IWINAC 2011, PART I, 2011, 6686 : 323 - 333
  • [28] Protein structure comparison using the Markov transition model of evolution
    Kawabata, T
    Nishikawa, K
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2000, 41 (01) : 108 - 122
  • [29] Tpgen: a language model for stable protein design with a specific topology structure
    Xiaoping Min
    Chongzhou Yang
    Jun Xie
    Yang Huang
    Nan Liu
    Xiaocheng Jin
    Tianshu Wang
    Zhibo Kong
    Xiaoli Lu
    Shengxiang Ge
    Jun Zhang
    Ningshao Xia
    BMC Bioinformatics, 25
  • [30] Tpgen: a language model for stable protein design with a specific topology structure
    Min, Xiaoping
    Yang, Chongzhou
    Xie, Jun
    Huang, Yang
    Liu, Nan
    Jin, Xiaocheng
    Wang, Tianshu
    Kong, Zhibo
    Lu, Xiaoli
    Ge, Shengxiang
    Zhang, Jun
    Xia, Ningshao
    BMC BIOINFORMATICS, 2024, 25 (01)