Unique function words characterize genomic proteins

被引:5
|
作者
Scaiewicz, Andrea [1 ]
Levitt, Michael [1 ]
机构
[1] Stanford Univ, Sch Med, Dept Struct Biol, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
protein universe; genomic sequences; functional profiles; domain architecture; shared function; EVOLUTION; SUPERFAMILIES; HOMOLOGY; DATABASE; UNIVERSE; IMPACT;
D O I
10.1073/pnas.1801182115
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Between 2009 and 2016 the number of protein sequences from known species increased 10-fold from 8 million to 85 million. About 80% of these sequences contain at least one region recognized by the conserved domain architecture retrieval tool (CDART) as a sequence motif. Motifs provide clues to biological function but CDART often matches the same region of a protein by two or more profiles. Such synonyms complicate estimates of functional complexity. We do full-linkage clustering of redundant profiles by finding maximum disjoint cliques: Each cluster is replaced by a single representative profile to give what we term a unique function word (UFW). From 2009 to 2016, the number of sequence profiles used by CDART increased by 80%; the number of UFW5 increased more slowly by 30%, indicating that the number of UFW5 may be saturating. The number of sequences matched by a single UFW (sequences with single domain architectures) increased as slowly as the number of different words, whereas the number of sequences matched by a combination of two or more UFW5 in sequences with multiple domain architectures (MDAs) increased at the same rate as the total number of sequences. This combinatorial arrangement of a limited number of UFW5 in MDAs accounts for the genomic diversity of protein sequences. Although eukaryotes and prokaryotes use very similar sets of "words" or UFW5 (57% shared), the "sentences" (MDAs) are different (1.3% shared).
引用
收藏
页码:6703 / 6708
页数:6
相关论文
共 50 条
  • [1] Unique Genomic Alterations Characterize Undifferentiated Melanoma
    Fischer, Grant
    Mahadevan, Navin
    Hornick, Jason
    Fletcher, Christopher
    Russell-Goldman, Eleanor
    LABORATORY INVESTIGATION, 2024, 104 (03) : S520 - S522
  • [2] The unique architecture and function of cellulose-interacting proteins in oomycetes revealed by genomic and structural analyses
    Larroque, Mathieu
    Barriot, Roland
    Bottin, Arnaud
    Barre, Annick
    Rouge, Pierre
    Dumas, Bernard
    Gaulin, Elodie
    BMC GENOMICS, 2012, 13
  • [3] The unique architecture and function of cellulose-interacting proteins in oomycetes revealed by genomic and structural analyses
    Mathieu Larroque
    Roland Barriot
    Arnaud Bottin
    Annick Barre
    Pierre Rougé
    Bernard Dumas
    Elodie Gaulin
    BMC Genomics, 13
  • [4] Unique structure and function of chloride transporting CLC proteins
    Pusch, M
    Jentsch, TJ
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2005, 4 (01) : 49 - 57
  • [5] Lysate-based pipeline to characterize microtubule-associated proteins uncovers unique microtubule behaviours
    Jijumon, A. S.
    Bodakuntla, Satish
    Genova, Mariya
    Bangera, Mamata
    Sackett, Violet
    Besse, Laetitia
    Maksut, Fatlinda
    Henriot, Veronique
    Magiera, Maria M.
    Sirajuddin, Minhajuddin
    Janke, Carsten
    NATURE CELL BIOLOGY, 2022, 24 (02) : 253 - +
  • [6] Lysate-based pipeline to characterize microtubule-associated proteins uncovers unique microtubule behaviours
    A. S. Jijumon
    Satish Bodakuntla
    Mariya Genova
    Mamata Bangera
    Violet Sackett
    Laetitia Besse
    Fatlinda Maksut
    Veronique Henriot
    Maria M. Magiera
    Minhajuddin Sirajuddin
    Carsten Janke
    Nature Cell Biology, 2022, 24 : 253 - 267
  • [7] UNIQUE SUBWORDS IN NONPERIODIC WORDS
    WEINBAUM, CM
    PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY, 1990, 109 (03) : 615 - 619
  • [8] On unique factorizations of primitive words
    Hatju, T
    Nowotka, D
    THEORETICAL COMPUTER SCIENCE, 2006, 356 (1-2) : 186 - 189
  • [9] A unique extension of rich words
    Rukavicka, Josef
    THEORETICAL COMPUTER SCIENCE, 2021, 896 (896) : 53 - 64
  • [10] A mixture model to characterize genomic alterations of tumors
    Keribin, Christine
    Liu, Yi
    Popova, Tatiana
    Rozenholc, Yves
    JOURNAL OF THE SFDS, 2019, 160 (01): : 130 - 148