Estimating the number of protein folds and families from complete genome data

被引:142
|
作者
Wolf, YI
Grishin, NV
Koonin, EV
机构
[1] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20894 USA
[2] Russian Acad Sci, Inst Cytol & Genet, Novosibirsk 630090, Russia
关键词
protein structure classification; structural genomics; sampling; logarithmic distribution;
D O I
10.1006/jmbi.2000.3786
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Using the data on proteins encoded in complete genomes, combined with a rigorous theory of the sampling process, we estimate the total number of protein folds and families, as well as the number of folds and families in each genome. The total number of folds in globular, water-soluble proteins is estimated at about 1000, with structural information currently available for about one-third of that number. The sequenced genomes of unicellular organisms encode from approximately 25%, for the minimal genomes of the Mycoplasmas, to 70-80% for larger genomes, such as Escherichia coli and yeast, of the total number of folds, The number of protein families with significant sequence conservation was estimated to be between 4000 and 7000, with structures available for about 20% of these. (C) 2000 Academic Press.
引用
收藏
页码:897 / 905
页数:9
相关论文
共 50 条
  • [1] Estimating the number of protein folds
    Zhang, CO
    DeLisi, C
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1998, 284 (05) : 1301 - 1305
  • [2] Estimating the total number of protein folds
    Govindarajan, S
    Recabarren, R
    Goldstein, RK
    [J]. PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1999, 35 (04): : 408 - 414
  • [3] The number of protein folds and their distribution over families in nature
    Liu, XS
    Fan, K
    Wang, W
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 54 (03) : 491 - 499
  • [4] Relations of the numbers of protein sequences, families and folds
    Zhang, CT
    [J]. PROTEIN ENGINEERING, 1997, 10 (07): : 757 - 761
  • [5] Protein folds and families: sequence and structure alignments
    Holm, L
    Sander, C
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 244 - 247
  • [6] A limited universe of membrane protein families and folds
    Oberai, Amit
    Ihm, Yungok
    Kim, Sanguk
    Bowie, James U.
    [J]. PROTEIN SCIENCE, 2006, 15 (07) : 1723 - 1734
  • [7] ESTIMATING GENE NUMBER AND GENOME COMPLEXITY
    SOLIGNAC, M
    GENERMONT, J
    [J]. ANNEE BIOLOGIQUE, 1982, 21 (03): : 209 - 273
  • [8] Molecular size scaling in families of protein native folds
    Parker Rogerson
    Gustavo A. Arteca
    [J]. Journal of Mathematical Chemistry, 2011, 49
  • [9] Uncovering new families and folds in the natural protein universe
    Durairaj, Janani
    Waterhouse, Andrew M.
    Mets, Toomas
    Brodiazhenko, Tetiana
    Abdullah, Minhal
    Studer, Gabriel
    Tauriello, Gerardo
    Akdel, Mehmet
    Andreeva, Antonina
    Bateman, Alex
    Tenson, Tanel
    Hauryliuk, Vasili
    Schwede, Torsten
    Pereira, Joana
    [J]. NATURE, 2023, 622 (7983) : 646 - +
  • [10] Molecular size scaling in families of protein native folds
    Rogerson, Parker
    Arteca, Gustavo A.
    [J]. JOURNAL OF MATHEMATICAL CHEMISTRY, 2011, 49 (08) : 1493 - 1506