GENERALIST: A latent space based generative model for protein sequence families

被引:1
|
作者
Akl, Hoda [1 ]
Emison, Brooke [2 ]
Zhao, Xiaochuan [1 ]
Mondal, Arup [3 ]
Perez, Alberto [3 ]
Dixit, Purushottam D. [2 ,4 ]
机构
[1] Univ Florida, Dept Phys, Gainesville, FL 33612 USA
[2] Yale Univ, Dept Biomed Engn, New Haven, CT 06520 USA
[3] Univ Florida, Dept Chem, Gainesville, FL USA
[4] Yale Univ, Syst Biol Inst, West Haven, CT 06520 USA
关键词
EXPANSION;
D O I
10.1371/journal.pcbi.1011655
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Generative models of protein sequence families are an important tool in the repertoire of protein scientists and engineers alike. However, state-of-the-art generative approaches face inference, accuracy, and overfitting- related obstacles when modeling moderately sized to large proteins and/or protein families with low sequence coverage. Here, we present a simple to learn, tunable, and accurate generative model, GENERALIST: GENERAtive nonLInear tenSor-factorizaTion for protein sequences. GENERALIST accurately captures several high order summary statistics of amino acid covariation. GENERALIST also predicts conservative local optimal sequences which are likely to fold in stable 3D structure. Importantly, unlike current methods, the density of sequences in GENERALIST-modeled sequence ensembles closely resembles the corresponding natural ensembles. Finally, GENERALIST embeds protein sequences in an informative latent space. GENERALIST will be an important tool to study protein sequence variability. Protein sequence families show tremendous sequence variation. Yet, it is thought that a large portion of the functional sequence space remains unexplored. Generative models are machine learning methods that allow us to learn what makes proteins functional using sequences of naturally occurring proteins. Here, we present a new type of generative model GENERALIST: GENERAtive nonLInear tenSor-factorizaTion for protein sequences that is accurate, easy to implement, and works with very small datasets. We believe that GENERALIST will be an important tool in the repertoire of protein scientists and engineers alike.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] A Generative Model for Natural Sounds Based on Latent Force Modelling
    Wilkinson, William J.
    Reiss, Joshua D.
    Stowell, Dan
    LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2018), 2018, 10891 : 259 - 269
  • [22] Latent periodicity of the protein families
    Turutina, V. P.
    Korotkov, E. V.
    Laskin, A. A.
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE, VOL 1, 2004, : 374 - 377
  • [23] SPARSITY DRIVEN LATENT SPACE SAMPLING FOR GENERATIVE PRIOR BASED COMPRESSIVE SENSING
    Killedar, Vinayak
    Pokala, Praveen Kumar
    Seelamantula, Chandra Sekhar
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2895 - 2899
  • [24] Bridging the islands of protein families in sequence space using artificial sequences
    Srinivasan, N.
    Mudgal, R.
    Kumar, G.
    Chandra, N. R.
    Sowdhamini, R.
    Sandhya, S.
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2015, 33 : 103 - 104
  • [25] Inverse Design of Electromagnetic Metasurfaces Utilizing Infinite and Separate Latent Space Yielded a Machine-Based Generative Model
    Kim, Jong-Hoon
    Hong, Ic-Pyo
    JOURNAL OF ELECTROMAGNETIC ENGINEERING AND SCIENCE, 2024, 24 (02): : 178 - 190
  • [26] Deep generative model for drug design from protein target sequence
    Yangyang Chen
    Zixu Wang
    Lei Wang
    Jianmin Wang
    Pengyong Li
    Dongsheng Cao
    Xiangxiang Zeng
    Xiucai Ye
    Tetsuya Sakurai
    Journal of Cheminformatics, 15
  • [27] Deep generative model for drug design from protein target sequence
    Chen, Yangyang
    Wang, Zixu
    Wang, Lei
    Wang, Jianmin
    Li, Pengyong
    Cao, Dongsheng
    Zeng, Xiangxiang
    Ye, Xiucai
    Sakurai, Tetsuya
    JOURNAL OF CHEMINFORMATICS, 2023, 15 (01)
  • [28] Generative power of a protein language model trained on multiple sequence alignments
    Sgarbossa, Damiano
    Lupo, Umberto
    Bitbol, Anne-Florence
    ELIFE, 2023, 12
  • [29] Complexity Matters: Rethinking the Latent Space for Generative Modeling
    Hu, Tianyang
    Chen, Fei
    Wang, Haonan
    Li, Jiawei
    Wang, Wenjia
    Sun, Jiacheng
    Li, Zhenguo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [30] Generative classification model for categorical data based on latent Gaussian process
    Lv, Fengmao
    Yang, Guowu
    Zhu, William
    Liu, Chuan
    PATTERN RECOGNITION LETTERS, 2017, 92 : 56 - 61