Estimating the number of protein folds and families from complete genome data

被引:142
|
作者
Wolf, YI
Grishin, NV
Koonin, EV
机构
[1] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20894 USA
[2] Russian Acad Sci, Inst Cytol & Genet, Novosibirsk 630090, Russia
关键词
protein structure classification; structural genomics; sampling; logarithmic distribution;
D O I
10.1006/jmbi.2000.3786
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Using the data on proteins encoded in complete genomes, combined with a rigorous theory of the sampling process, we estimate the total number of protein folds and families, as well as the number of folds and families in each genome. The total number of folds in globular, water-soluble proteins is estimated at about 1000, with structural information currently available for about one-third of that number. The sequenced genomes of unicellular organisms encode from approximately 25%, for the minimal genomes of the Mycoplasmas, to 70-80% for larger genomes, such as Escherichia coli and yeast, of the total number of folds, The number of protein families with significant sequence conservation was estimated to be between 4000 and 7000, with structures available for about 20% of these. (C) 2000 Academic Press.
引用
收藏
页码:897 / 905
页数:9
相关论文
共 50 条
  • [31] Estimating the basic reproduction number from surveillance data on past epidemics
    Froda, Sorana
    Leduc, Hugues
    [J]. MATHEMATICAL BIOSCIENCES, 2014, 256 : 89 - 101
  • [32] Estimating the Number of Induced Subgraphs from Incomplete Data and Neighborhood Queries
    Fotakis, Dimitris
    Pittas, Thanasis
    Skoulakis, Stratis
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 4045 - 4053
  • [33] Sequential clustering with particle filters - Estimating the number of clusters from data
    Schubert, J
    Sidenbladh, H
    [J]. 2005 7TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), VOLS 1 AND 2, 2005, : 122 - 129
  • [34] Estimating the size of neural networks from the number of available training data
    Lappas, Georgios
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2007, PT 1, PROCEEDINGS, 2007, 4668 : 68 - 77
  • [35] Deuterated protein folds obtained directly from unassigned nuclear overhauser effect data
    Bermejo, Guíllermo A.
    Llinás, Míguel
    [J]. Journal of the American Chemical Society, 2008, 130 (12): : 3797 - 3805
  • [36] Rapid determination of protein folds from orientation-dependent NMR data.
    Prestegard, JH
    Fowler, CA
    Tian, F
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2000, 219 : U350 - U350
  • [37] Protein families and TRIBES in genome sequence space
    Enright, AJ
    Kunin, V
    Ouzounis, CA
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (15) : 4632 - 4638
  • [38] Deuterated protein folds obtained directly from unassigned nuclear overhauser effect data
    Bermejo, Guillermo A.
    Llinas, Miguel
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2008, 130 (12) : 3797 - 3805
  • [39] From complete genome sequence to 'complete' understanding?
    Galperin, Michael Y.
    Koonin, Eugene V.
    [J]. TRENDS IN BIOTECHNOLOGY, 2010, 28 (08) : 398 - 406
  • [40] ESTIMATING AND TESTING A COMPLETE SYSTEM OF DEMAND-FUNCTIONS FROM REGIONAL DATA
    MCCONNELL, KE
    [J]. APPLIED ECONOMICS, 1978, 10 (02) : 93 - 104