The generative capacity of probabilistic protein sequence models

被引:21
|
作者
McGee, Francisco [1 ,2 ,3 ]
Hauri, Sandro [4 ,5 ]
Novinger, Quentin [2 ,5 ]
Vucetic, Slobodan [4 ,5 ]
Levy, Ronald M. [1 ,3 ,6 ,7 ]
Carnevale, Vincenzo [2 ,3 ]
Haldane, Allan [1 ,7 ]
机构
[1] Temple Univ, Ctr Biophys & Computat Biol, Philadelphia, PA 19122 USA
[2] Temple Univ, Inst Computat Mol Sci, Philadelphia, PA 19122 USA
[3] Temple Univ, Dept Biol, Philadelphia, PA 19122 USA
[4] Temple Univ, Ctr Hybrid Intelligence, Philadelphia, PA 19122 USA
[5] Temple Univ, Dept Comp & Informat Sci, Philadelphia, PA 19122 USA
[6] Temple Univ, Dept Phys, Philadelphia, PA 19122 USA
[7] Temple Univ, Dept Chem, Philadelphia, PA 19122 USA
基金
美国国家科学基金会;
关键词
KINASE FAMILY PROTEINS; COEVOLUTIONARY LANDSCAPE; ENSEMBLES; CAPTURE;
D O I
10.1038/s41467-021-26529-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Generative models have become increasingly popular in protein design, yet rigorous metrics that allow the comparison of these models are lacking. Here, the authors propose a set of such metrics and use them to compare three popular models. Potts models and variational autoencoders (VAEs) have recently gained popularity as generative protein sequence models (GPSMs) to explore fitness landscapes and predict mutation effects. Despite encouraging results, current model evaluation metrics leave unclear whether GPSMs faithfully reproduce the complex multi-residue mutational patterns observed in natural sequences due to epistasis. Here, we develop a set of sequence statistics to assess the "generative capacity" of three current GPSMs: the pairwise Potts Hamiltonian, the VAE, and the site-independent model. We show that the Potts model's generative capacity is largest, as the higher-order mutational statistics generated by the model agree with those observed for natural sequences, while the VAE's lies between the Potts and site-independent models. Importantly, our work provides a new framework for evaluating and interpreting GPSM accuracy which emphasizes the role of higher-order covariation and epistasis, with broader implications for probabilistic sequence models in general.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] The generative capacity of probabilistic protein sequence models
    Francisco McGee
    Sandro Hauri
    Quentin Novinger
    Slobodan Vucetic
    Ronald M. Levy
    Vincenzo Carnevale
    Allan Haldane
    Nature Communications, 12
  • [2] Protein sequence design with deep generative models
    Wu, Zachary
    Johnston, Kadina E.
    Arnold, Frances H.
    Yang, Kevin K.
    CURRENT OPINION IN CHEMICAL BIOLOGY, 2021, 65 : 18 - 27
  • [3] Exploring the Protein Sequence Space with Global Generative Models
    Romero-Romero, Sergio
    Lindner, Sebastian
    Ferruz, Noelia
    COLD SPRING HARBOR PERSPECTIVES IN BIOLOGY, 2023, 15 (11):
  • [4] Interpretable pairwise distillations for generative protein sequence models
    Feinauer, Christoph
    Meynard-Piganeau, Barthelemy
    Lucibello, Carlo
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (06)
  • [5] Generative probabilistic models for protein-protein interaction networks-the biclique perspective
    Schweiger, Regev
    Linial, Michal
    Linial, Nathan
    BIOINFORMATICS, 2011, 27 (13) : I142 - I148
  • [6] On Memorization in Probabilistic Deep Generative Models
    van den Burg, Gerrit J. J.
    Williams, Christopher K. I.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Probabilistic generative transformer language models for generative design of molecules
    Wei, Lai
    Fu, Nihang
    Song, Yuqi
    Wang, Qian
    Hu, Jianjun
    JOURNAL OF CHEMINFORMATICS, 2023, 15 (01)
  • [8] Generative models for protein sequence modeling: recent advances and future directions
    Mardikoraem, Mehrsa
    Wang, Zirui
    Pascual, Nathaniel
    Woldring, Daniel
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (06)
  • [9] Probabilistic generative transformer language models for generative design of molecules
    Lai Wei
    Nihang Fu
    Yuqi Song
    Qian Wang
    Jianjun Hu
    Journal of Cheminformatics, 15
  • [10] Survey of Variational Inferences in Probabilistic Generative Models
    Chen Y.
    Yang J.
    Shi Y.
    Wang Y.
    Zhao T.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2022, 59 (03): : 617 - 632