GENERALIST: A latent space based generative model for protein sequence families

被引:1
|
作者
Akl, Hoda [1 ]
Emison, Brooke [2 ]
Zhao, Xiaochuan [1 ]
Mondal, Arup [3 ]
Perez, Alberto [3 ]
Dixit, Purushottam D. [2 ,4 ]
机构
[1] Univ Florida, Dept Phys, Gainesville, FL 33612 USA
[2] Yale Univ, Dept Biomed Engn, New Haven, CT 06520 USA
[3] Univ Florida, Dept Chem, Gainesville, FL USA
[4] Yale Univ, Syst Biol Inst, West Haven, CT 06520 USA
关键词
EXPANSION;
D O I
10.1371/journal.pcbi.1011655
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Generative models of protein sequence families are an important tool in the repertoire of protein scientists and engineers alike. However, state-of-the-art generative approaches face inference, accuracy, and overfitting- related obstacles when modeling moderately sized to large proteins and/or protein families with low sequence coverage. Here, we present a simple to learn, tunable, and accurate generative model, GENERALIST: GENERAtive nonLInear tenSor-factorizaTion for protein sequences. GENERALIST accurately captures several high order summary statistics of amino acid covariation. GENERALIST also predicts conservative local optimal sequences which are likely to fold in stable 3D structure. Importantly, unlike current methods, the density of sequences in GENERALIST-modeled sequence ensembles closely resembles the corresponding natural ensembles. Finally, GENERALIST embeds protein sequences in an informative latent space. GENERALIST will be an important tool to study protein sequence variability. Protein sequence families show tremendous sequence variation. Yet, it is thought that a large portion of the functional sequence space remains unexplored. Generative models are machine learning methods that allow us to learn what makes proteins functional using sequences of naturally occurring proteins. Here, we present a new type of generative model GENERALIST: GENERAtive nonLInear tenSor-factorizaTion for protein sequences that is accurate, easy to implement, and works with very small datasets. We believe that GENERALIST will be an important tool in the repertoire of protein scientists and engineers alike.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] How are model protein structures distributed in sequence space?
    BornbergBauer, E
    BIOPHYSICAL JOURNAL, 1997, 73 (05) : 2393 - 2403
  • [42] Latent State Inference in a Spatiotemporal Generative Model
    Karlbauer, Matthias
    Menge, Tobias
    Otte, Sebastian
    Lensch, Hendrik P. A.
    Scholten, Thomas
    Wulfmeyer, Volker
    Butz, Martin, V
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 384 - 395
  • [43] Sparse generative modeling via parameter reduction of Boltzmann machines: Application to protein-sequence families
    Barrat-Charlaix, Pierre
    Muntoni, Anna Paola
    Shimagaki, Kai
    Weigt, Martin
    Zamponi, Francesco
    PHYSICAL REVIEW E, 2021, 104 (02)
  • [44] Comparison of protein repeat classifications based on structure and sequence families
    Paladin, Lisanna
    Tosatto, Silvio C. E.
    BIOCHEMICAL SOCIETY TRANSACTIONS, 2015, 43 : 832 - 837
  • [45] Assessing Sample Quality via the Latent Space of Generative Models
    Xu, Jingyi
    Le, Hieu
    Samaras, Dimitris
    COMPUTER VISION - ECCV 2024, PT LIX, 2025, 15117 : 449 - 464
  • [46] GINT: A Generative Interpretability method via perturbation in the latent space
    Tang, Caizhi
    Cui, Qing
    Li, Longfei
    Zhou, Jun
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 232
  • [47] WL-GAN: Learning to sample in generative latent space
    Hou, Zeyi
    Lang, Ning
    Zhou, Xiuzhuang
    INFORMATION SCIENCES, 2025, 700
  • [48] Illuminating Mario Scenes in the Latent Space of a Generative Adversarial Network
    Fontaine, Matthew C.
    Liu, Ruilin
    Khalifa, Ahmed
    Modi, Jignesh
    Togelius, Julian
    Hoover, Amy K.
    Nikolaidis, Stefanos
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 5922 - 5930
  • [49] Desirable molecule discovery via generative latent space exploration
    Zheng, Wanjie
    Li, Jie
    Zhang, Yang
    VISUAL INFORMATICS, 2023, 7 (04) : 13 - 21
  • [50] Adaptive Learning of the Latent Space of Wasserstein Generative Adversarial Networks
    Qiu, Yixuan
    Gao, Qingyi
    Wang, Xiao
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024,