GENERALIST: A latent space based generative model for protein sequence families

被引:1
|
作者
Akl, Hoda [1 ]
Emison, Brooke [2 ]
Zhao, Xiaochuan [1 ]
Mondal, Arup [3 ]
Perez, Alberto [3 ]
Dixit, Purushottam D. [2 ,4 ]
机构
[1] Univ Florida, Dept Phys, Gainesville, FL 33612 USA
[2] Yale Univ, Dept Biomed Engn, New Haven, CT 06520 USA
[3] Univ Florida, Dept Chem, Gainesville, FL USA
[4] Yale Univ, Syst Biol Inst, West Haven, CT 06520 USA
关键词
EXPANSION;
D O I
10.1371/journal.pcbi.1011655
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Generative models of protein sequence families are an important tool in the repertoire of protein scientists and engineers alike. However, state-of-the-art generative approaches face inference, accuracy, and overfitting- related obstacles when modeling moderately sized to large proteins and/or protein families with low sequence coverage. Here, we present a simple to learn, tunable, and accurate generative model, GENERALIST: GENERAtive nonLInear tenSor-factorizaTion for protein sequences. GENERALIST accurately captures several high order summary statistics of amino acid covariation. GENERALIST also predicts conservative local optimal sequences which are likely to fold in stable 3D structure. Importantly, unlike current methods, the density of sequences in GENERALIST-modeled sequence ensembles closely resembles the corresponding natural ensembles. Finally, GENERALIST embeds protein sequences in an informative latent space. GENERALIST will be an important tool to study protein sequence variability. Protein sequence families show tremendous sequence variation. Yet, it is thought that a large portion of the functional sequence space remains unexplored. Generative models are machine learning methods that allow us to learn what makes proteins functional using sequences of naturally occurring proteins. Here, we present a new type of generative model GENERALIST: GENERAtive nonLInear tenSor-factorizaTion for protein sequences that is accurate, easy to implement, and works with very small datasets. We believe that GENERALIST will be an important tool in the repertoire of protein scientists and engineers alike.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] ClusterGAN: Latent Space Clustering in Generative Adversarial Networks
    Mukherjee, Sudipto
    Asnani, Himanshu
    Lin, Eugene
    Kannan, Sreeram
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4610 - 4617
  • [32] Evolutionary Latent Space Exploration of Generative Adversarial Networks
    Fernandes, Paulo
    Correia, Joao
    Machado, Penousal
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2020, 2020, 12104 : 595 - 609
  • [33] A Latent Space Understandable Generative Adversarial Network: SelfExGAN
    Liu, Yongjie
    Wang, Qianlong
    Gu, Yanlei
    Kamijo, Shunsuke
    2017 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING - TECHNIQUES AND APPLICATIONS (DICTA), 2017, : 353 - 360
  • [34] Space is a latent sequence: A theory of the hippocampus
    Raju, Rajkumar Vasudeva
    Guntupalli, J. Swaroop
    Zhou, Guangyao
    Wendelken, Carter
    Lazaro-Gredilla, Miguel
    George, Dileep
    SCIENCE ADVANCES, 2024, 10 (31):
  • [35] Trajectory adjustment for nonprehensile manipulation using latent space of trained sequence-to-sequence model
    Kutsuzawa, K.
    Sakaino, S.
    Tsuji, T.
    ADVANCED ROBOTICS, 2019, 33 (21) : 1144 - 1154
  • [36] Trajectory adjustment for nonprehensile manipulation using latent space of trained sequence-to-sequence model*
    Kutsuzawa, K.
    Sakaino, S.
    Tsuji, T.
    Advanced Robotics, 2019, 33 (21): : 1144 - 1154
  • [37] Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction
    Zhang, Jing
    Xie, Jianwen
    Barnes, Nick
    Li, Ping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [38] Compressible Latent-Space Invertible Networks for Generative Model-Constrained Image Reconstruction
    Kelkar, Varun A.
    Bhadra, Sayantan
    Anastasio, Mark A.
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2021, 7 : 209 - 223
  • [39] Learning generative models for protein fold families
    Balakrishnan, Sivaraman
    Kamisetty, Hetunandan
    Carbonell, Jaime G.
    Lee, Su-In
    Langmead, Christopher James
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (04) : 1061 - 1078
  • [40] Compressible Latent-Space Invertible Networks for Generative Model-Constrained Image Reconstruction
    Kelkar, Varun A.
    Bhadra, Sayantan
    Anastasio, Mark A.
    IEEE Transactions on Computational Imaging, 2021, 7 : 209 - 223