The generative capacity of probabilistic protein sequence models

被引:21
|
作者
McGee, Francisco [1 ,2 ,3 ]
Hauri, Sandro [4 ,5 ]
Novinger, Quentin [2 ,5 ]
Vucetic, Slobodan [4 ,5 ]
Levy, Ronald M. [1 ,3 ,6 ,7 ]
Carnevale, Vincenzo [2 ,3 ]
Haldane, Allan [1 ,7 ]
机构
[1] Temple Univ, Ctr Biophys & Computat Biol, Philadelphia, PA 19122 USA
[2] Temple Univ, Inst Computat Mol Sci, Philadelphia, PA 19122 USA
[3] Temple Univ, Dept Biol, Philadelphia, PA 19122 USA
[4] Temple Univ, Ctr Hybrid Intelligence, Philadelphia, PA 19122 USA
[5] Temple Univ, Dept Comp & Informat Sci, Philadelphia, PA 19122 USA
[6] Temple Univ, Dept Phys, Philadelphia, PA 19122 USA
[7] Temple Univ, Dept Chem, Philadelphia, PA 19122 USA
基金
美国国家科学基金会;
关键词
KINASE FAMILY PROTEINS; COEVOLUTIONARY LANDSCAPE; ENSEMBLES; CAPTURE;
D O I
10.1038/s41467-021-26529-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Generative models have become increasingly popular in protein design, yet rigorous metrics that allow the comparison of these models are lacking. Here, the authors propose a set of such metrics and use them to compare three popular models. Potts models and variational autoencoders (VAEs) have recently gained popularity as generative protein sequence models (GPSMs) to explore fitness landscapes and predict mutation effects. Despite encouraging results, current model evaluation metrics leave unclear whether GPSMs faithfully reproduce the complex multi-residue mutational patterns observed in natural sequences due to epistasis. Here, we develop a set of sequence statistics to assess the "generative capacity" of three current GPSMs: the pairwise Potts Hamiltonian, the VAE, and the site-independent model. We show that the Potts model's generative capacity is largest, as the higher-order mutational statistics generated by the model agree with those observed for natural sequences, while the VAE's lies between the Potts and site-independent models. Importantly, our work provides a new framework for evaluating and interpreting GPSM accuracy which emphasizes the role of higher-order covariation and epistasis, with broader implications for probabilistic sequence models in general.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] A TALE OF THREE PROBABILISTIC FAMILIES: DISCRIMINATIVE, DESCRIPTIVE, AND GENERATIVE MODELS
    Wu, Ying Nian
    Gao, Ruiqi
    Han, Tian
    Zhu, Song-Chun
    QUARTERLY OF APPLIED MATHEMATICS, 2019, 77 (02) : 423 - 465
  • [32] Scalable and exact sampling method for probabilistic generative graph models
    Moreno, Sebastian
    Pfeiffer, Joseph J., III
    Neville, Jennifer
    DATA MINING AND KNOWLEDGE DISCOVERY, 2018, 32 (06) : 1561 - 1596
  • [33] Optimizing Probabilistic Models for Relational Sequence Learning
    Di Mauro, Nicola
    Basile, Teresa M. A.
    Ferilli, Stefano
    Esposito, Floriana
    FOUNDATIONS OF INTELLIGENT SYSTEMS, 2011, 6804 : 240 - 249
  • [34] Generative Quantum Machine Learning via Denoising Diffusion Probabilistic Models
    Zhang, Bingzhi
    Xu, Peng
    Chen, Xiaohui
    Zhuang, Quntao
    PHYSICAL REVIEW LETTERS, 2024, 132 (10)
  • [35] How to Trust Generative Probabilistic Models for Time-Series Data?
    Piatkowski, Nico
    Posch, Peter N.
    Krause, Miguel
    LEARNING AND INTELLIGENT OPTIMIZATION, LION 15, 2021, 12931 : 283 - 298
  • [36] GenSQL: A Probabilistic Programming System for Querying Generative Models of Database Tables
    Huot, Mathieu
    Ghavami, Matin
    Lew, Alexander K.
    Schaechtle, Ulrich
    Freer, Cameron E.
    Shelby, Zane
    Rinard, Martin C.
    Saad, Feras A.
    Mansinghka, Vikash K.
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2024, 8 (PLDI):
  • [37] Towards developing probabilistic generative models for reasoning with natural language representations
    Marcu, D
    Popescu, AM
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 88 - 99
  • [38] Policy Optimization by Marginal-MAP Probabilistic Inference in Generative Models
    Kiselev, Igor
    Poupart, Pascal
    AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1611 - 1612
  • [39] Problems using deep generative models for probabilistic audio source separation
    Frank, Maurice
    Ilse, Maximilian
    NEURIPS WORKSHOPS, 2020, 2020, 137 : 53 - 59
  • [40] Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification
    Zhong, Zilong
    Li, Jonathan
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8191 - 8192