Latent generative landscapes as maps of functional diversity in protein sequence space

被引:0
|
作者
Cheyenne Ziegler
Jonathan Martin
Claude Sinner
Faruck Morcos
机构
[1] University of Texas at Dallas,Department of Biological Sciences
[2] University of Texas at Dallas,Department of Bioengineering
[3] University of Texas at Dallas,Center for Systems Biology
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Variational autoencoders are unsupervised learning models with generative capabilities, when applied to protein data, they classify sequences by phylogeny and generate de novo sequences which preserve statistical properties of protein composition. While previous studies focus on clustering and generative features, here, we evaluate the underlying latent manifold in which sequence information is embedded. To investigate properties of the latent manifold, we utilize direct coupling analysis and a Potts Hamiltonian model to construct a latent generative landscape. We showcase how this landscape captures phylogenetic groupings, functional and fitness properties of several systems including Globins, β-lactamases, ion channels, and transcription factors. We provide support on how the landscape helps us understand the effects of sequence variability observed in experimental data and provides insights on directed and natural protein evolution. We propose that combining generative properties and functional predictive power of variational autoencoders and coevolutionary analysis could be beneficial in applications for protein engineering and design.
引用
收藏
相关论文
共 50 条
  • [41] The generative capacity of probabilistic protein sequence models
    Francisco McGee
    Sandro Hauri
    Quentin Novinger
    Slobodan Vucetic
    Ronald M. Levy
    Vincenzo Carnevale
    Allan Haldane
    Nature Communications, 12
  • [42] On space of open quotient maps of a convergent sequence
    Koporkh, K. M.
    CARPATHIAN MATHEMATICAL PUBLICATIONS, 2012, 4 (01) : 58 - 66
  • [43] Topological features of rugged fitness landscapes in sequence space
    Kondrashov, Dmitry A.
    Kondrashov, Fyodor A.
    TRENDS IN GENETICS, 2015, 31 (01) : 24 - 33
  • [44] Exploring protein sequence-function landscapes
    Starr, Tyler N.
    Thornton, Joseph W.
    NATURE BIOTECHNOLOGY, 2017, 35 (02) : 125 - 126
  • [45] The human β-myosin heavy chain gene:: Sequence diversity and functional characteristics of the protein
    Wendel, B
    Reinhard, R
    Wachtendorf, U
    Zacharzowsky, UB
    Osterziel, KJ
    Schulte, HD
    Haase, H
    Hoehe, MR
    Morano, I
    JOURNAL OF CELLULAR BIOCHEMISTRY, 2000, 79 (04) : 566 - 575
  • [46] Functional Sequence in Norm Space
    Yamazaki, Hiroshi
    FORMALIZED MATHEMATICS, 2020, 28 (04): : 263 - 268
  • [47] Fitness landscapes arising from the sequence-structure maps of biopolymers
    Stadler, PF
    JOURNAL OF MOLECULAR STRUCTURE-THEOCHEM, 1999, 463 (1-2): : 7 - 19
  • [48] Dirichlet latent modelling enables effective learning and sampling of the functional protein design space
    Lobzaev, Evgenii
    Stracquadanio, Giovanni
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [49] Searching the Latent Space of a Generative Adversarial Network to Generate DOOM Levels
    Giacomello, Edoardo
    Lanzi, Pier Luca
    Loiacono, Daniele
    2019 IEEE CONFERENCE ON GAMES (COG), 2019,
  • [50] Latent Space Visualization of Half Face and Full Face by Generative Model
    Zou, Min
    Akashi, Takuya
    FIFTEENTH INTERNATIONAL CONFERENCE ON QUALITY CONTROL BY ARTIFICIAL VISION, 2021, 11794