Sequence-structure-function relationships in the microbial protein universe

被引:16
|
作者
Leman, Julia Koehler [1 ,2 ]
Szczerbiak, Pawel [3 ]
Renfrew, P. Douglas [1 ,2 ]
Gligorijevic, Vladimir [1 ,4 ]
Berenberg, Daniel [1 ,4 ,5 ,6 ]
Vatanen, Tommi [7 ,8 ,9 ]
Taylor, Bryn C. [10 ]
Chandler, Chris [1 ]
Janssen, Stefan
Pataki, Andras
Carriero, Nick
Fisk, Ian
Xavier, Ramnik J. [7 ]
Knight, Rob [10 ]
Bonneau, Richard [1 ,2 ,4 ,5 ,6 ]
Kosciolek, Tomasz [3 ]
机构
[1] Simons Fdn, Flatiron Inst, Ctr Computat Biol, New York, NY 10010 USA
[2] NYU, Dept Biol, New York, NY 10010 USA
[3] Jagiellonian Univ, Malopolska Ctr Biotechnol, Krakow, Poland
[4] Prescient Design, Genentech accelerator, New York, NY 10010 USA
[5] NYU, Ctr Data Sci, New York, NY 10011 USA
[6] NYU, Courant Inst Math Sci, Dept Comp Sci, New York, NY USA
[7] Broad Inst, Cambridge, MA USA
[8] Univ Auckland, Liggins Inst, Auckland, New Zealand
[9] Univ Helsinki, Fac Med, Res Program Clin & Mol Metab, Helsinki 00014, Finland
[10] Univ Calif San Diego, Dept Pediat, La Jolla, CA USA
关键词
STRUCTURE SPACE; CONSENSUS PREDICTION; IMMUNE-SYSTEM; FOLD SPACE; REPRESENTATION; MULTIPLICITY; CONTINUITY; TOPOLOGY; COVERAGE; DOMAINS;
D O I
10.1038/s41467-023-37896-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don't rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict similar to 200,000 structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 148 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for a shift in focus across all branches of biology, from obtaining structures to putting them into context and from sequence-based to sequence-structure-function based meta-omics analyses. Advances in protein structure prediction have led to a significant influx of protein structure data. Here the authors exploit this data to offer an unbiased overview of complex sequence-structure-function relationships in the protein universe. This work opens up new uses for 3D structure data repositories in meta-omics and other fields of biology.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Sequence-structure-function relationships in the microbial protein universe
    Leman, Julia Koehler
    [J]. PROTEIN SCIENCE, 2023, 32
  • [2] Sequence-structure-function relationships in the microbial protein universe
    Julia Koehler Leman
    Pawel Szczerbiak
    P. Douglas Renfrew
    Vladimir Gligorijevic
    Daniel Berenberg
    Tommi Vatanen
    Bryn C. Taylor
    Chris Chandler
    Stefan Janssen
    Andras Pataki
    Nick Carriero
    Ian Fisk
    Ramnik J. Xavier
    Rob Knight
    Richard Bonneau
    Tomasz Kosciolek
    [J]. Nature Communications, 14
  • [3] Developing sequence-structure-function relationships in peptoid oligomers
    Kirshenbaum, Kent
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 247
  • [4] Guiding discovery of protein sequence-structure-function modeling
    Hussain, Azam
    Brooks, Charles L., III
    [J]. BIOINFORMATICS, 2024, 40 (01)
  • [5] Sequence-structure-function relationships of apoptotic nuclease - DNase II
    Cymerman, I. A.
    Schafer, P.
    Meiss, G.
    Bujnicki, J. M.
    [J]. FEBS JOURNAL, 2006, 273 : 248 - 248
  • [6] Lipase engineering database - Understanding and exploiting sequence-structure-function relationships
    Pleiss, J
    Fischer, M
    Peiker, M
    Thiele, C
    Schmid, RD
    [J]. JOURNAL OF MOLECULAR CATALYSIS B-ENZYMATIC, 2000, 10 (05) : 491 - 508
  • [7] An evolutionary systems approach to investigate sequence-structure-function relationships in Glycosyltransferases
    Taujale, Rahil
    Edison, Arthur
    Kannan, Natarajan
    [J]. GLYCOBIOLOGY, 2016, 26 (12) : 1469 - 1469
  • [8] Sequence-structure-function relationships in the GmrSD family of Type IV restriction enzymes
    Machnicka, M. A.
    Kaminska, K. H.
    Dunin-Horkawicz, S.
    Bujnicki, J. M.
    [J]. FEBS JOURNAL, 2014, 281 : 759 - 759
  • [9] eProS-a database and toolbox for investigating protein sequence-structure-function relationships through energy profiles
    Heinke, Florian
    Schildbach, Stefan
    Stockmann, Daniel
    Labudde, Dirk
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D320 - D326
  • [10] Sequence-structure-function relationships in class I MHC: A local frustration perspective
    Sercinoglu, Onur
    Ozbek, Pemra
    [J]. PLOS ONE, 2020, 15 (05):