Sequence-structure-function relationships in the microbial protein universe

被引:16
|
作者
Leman, Julia Koehler [1 ,2 ]
Szczerbiak, Pawel [3 ]
Renfrew, P. Douglas [1 ,2 ]
Gligorijevic, Vladimir [1 ,4 ]
Berenberg, Daniel [1 ,4 ,5 ,6 ]
Vatanen, Tommi [7 ,8 ,9 ]
Taylor, Bryn C. [10 ]
Chandler, Chris [1 ]
Janssen, Stefan
Pataki, Andras
Carriero, Nick
Fisk, Ian
Xavier, Ramnik J. [7 ]
Knight, Rob [10 ]
Bonneau, Richard [1 ,2 ,4 ,5 ,6 ]
Kosciolek, Tomasz [3 ]
机构
[1] Simons Fdn, Flatiron Inst, Ctr Computat Biol, New York, NY 10010 USA
[2] NYU, Dept Biol, New York, NY 10010 USA
[3] Jagiellonian Univ, Malopolska Ctr Biotechnol, Krakow, Poland
[4] Prescient Design, Genentech accelerator, New York, NY 10010 USA
[5] NYU, Ctr Data Sci, New York, NY 10011 USA
[6] NYU, Courant Inst Math Sci, Dept Comp Sci, New York, NY USA
[7] Broad Inst, Cambridge, MA USA
[8] Univ Auckland, Liggins Inst, Auckland, New Zealand
[9] Univ Helsinki, Fac Med, Res Program Clin & Mol Metab, Helsinki 00014, Finland
[10] Univ Calif San Diego, Dept Pediat, La Jolla, CA USA
关键词
STRUCTURE SPACE; CONSENSUS PREDICTION; IMMUNE-SYSTEM; FOLD SPACE; REPRESENTATION; MULTIPLICITY; CONTINUITY; TOPOLOGY; COVERAGE; DOMAINS;
D O I
10.1038/s41467-023-37896-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don't rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict similar to 200,000 structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 148 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for a shift in focus across all branches of biology, from obtaining structures to putting them into context and from sequence-based to sequence-structure-function based meta-omics analyses. Advances in protein structure prediction have led to a significant influx of protein structure data. Here the authors exploit this data to offer an unbiased overview of complex sequence-structure-function relationships in the protein universe. This work opens up new uses for 3D structure data repositories in meta-omics and other fields of biology.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Comprehensive classification of the plant non-specific lipid transfer protein superfamily towards its sequence-structure-function analysis
    Fleury, Cecile
    Gracy, Jerome
    Gautier, Marie-Francoise
    Pons, Jean-Luc
    Dufayard, Jean-Francois
    Labesse, Gilles
    Ruiz, Manuel
    de Lamotte, Frederic
    [J]. PEERJ, 2019, 7
  • [42] The Medium-Chain Dehydrogenase/Reductase Engineering Database: A systematic analysis of a diverse protein family to understand sequence-structure-function relationship
    Knoll, Michael
    Pleiss, Juergen
    [J]. PROTEIN SCIENCE, 2008, 17 (10) : 1689 - 1697
  • [43] Sequence-structure-function relationships of a tRNA (m7G46) methyltransferase studied by homology modeling and site-directed mutagenesis
    Purta, E
    Van Vliet, F
    Feder, M
    Skowronek, K
    Bujnicki, J
    Droogmans, L
    [J]. FEBS JOURNAL, 2005, 272 : 96 - 96
  • [44] FlgM anti-sigma factors:: identification of novel members of the family, evolutionary analysis, homology modeling, and analysis of sequence-structure-function relationships
    Pons, T.
    Gonzalez, B.
    Ceciliani, F.
    Galizzi, A.
    [J]. JOURNAL OF MOLECULAR MODELING, 2006, 12 (06) : 973 - 983
  • [45] FlgM anti-sigma factors: identification of novel members of the family, evolutionary analysis, homology modeling, and analysis of sequence-structure-function relationships
    T. Pons
    B. González
    F. Ceciliani
    A. Galizzi
    [J]. Journal of Molecular Modeling, 2006, 12 : 973 - 983
  • [46] Plasticity in Protein Sequence-Function Relationships
    He, Chenlu
    Beckett, Dorothy
    [J]. BIOPHYSICAL JOURNAL, 2020, 118 (03) : 201A - 201A
  • [47] STRUCTURE-FUNCTION-RELATIONSHIPS IN MICROBIAL EXOPOLYSACCHARIDES
    SUTHERLAND, IW
    [J]. BIOTECHNOLOGY ADVANCES, 1994, 12 (02) : 393 - 448
  • [48] The simplicity of protein sequence-function relationships
    Yeonwoo Park
    Brian P. H. Metzger
    Joseph W. Thornton
    [J]. Nature Communications, 15 (1)
  • [49] Structure and function relationships in natural microbial communities
    Schink, B
    [J]. FEMS MICROBIOLOGY REVIEWS, 2000, 24 (05) : 553 - 553
  • [50] Exploring the sequence-structure-function relationship for the intrinsically disordered beta gamma-crystallin Hahellin
    Gao, Meng
    Yang, Fei
    Zhang, Lei
    Su, Zhengding
    Huang, Yongqi
    [J]. JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2018, 36 (05): : 1171 - 1181