Identification and distribution of protein families in 120 completed genomes using Gene3D

被引:23
|
作者
Lee, D [1 ]
Grant, A [1 ]
Marsden, RL [1 ]
Orengo, C [1 ]
机构
[1] UCL, Dept Biochem, Biomol Struct & Modeling Grp, London WC1E 6BT, England
关键词
protein domain architecture; domain partnerships; structural genomics;
D O I
10.1002/prot.20409
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Using a new protocol, PFscape, we undertake a systematic identification of protein families and domain architectures in 120 complete genomes. PFscape clusters sequences into protein families using a Markov clustering algorithm (Enright et al., Nucleic Acids Res 2002;30:1575-1584) followed by complete linkage clustering according to sequence identity. Within each protein family, domains are recognized using a library of hidden Markov models comprising CATH structural and Pfam functional domains. Domain architectures are then determined using DomainFinder (Pearl et al., Protein Sci 2002; 11:233-244) and the protein family and domain architecture data are amalgamated in the Gene3D database (Buchan et al., Genome Res 2002;12:503-514). Using Gene3D, we have investigated protein sequence space, the extent of structural annotation, and the distribution of different domain architectures in completed genomes from all kingdoms of life. As with earlier studies by other researchers, the distribution of domain families shows power-law behavior such that the largest 2,000 domain families can be mapped to similar to 70% of nonsingleton genome sequences; the remaining sequences are assigned to much smaller families. While similar to 50% of domain annotations within a genome are assigned to 219 universal domain families, a much smaller proportion (< 10%) of protein sequences are assigned to universal protein families. This supports the mosaic theory of evolution whereby domain duplication followed by domain shuffling gives rise to novel domain architectures that can expand the protein functional repertoire of an organism. Functional data (e.g. COG/KEGG/GO) integrated within Gene3D result in a comprehensive resource that is currently being used in structure genomics initiatives and can be accessed via http://www. biochem.ucl.ac.uk/bsm/cath/Gene3D/. (c) 2005 Wiley-Liss, Inc.
引用
收藏
页码:603 / 615
页数:13
相关论文
共 50 条
  • [1] Gene3D: comprehensive structural and functional annotation of genomes
    Yeats, Corin
    Lees, Jonathan
    Reid, Adam
    Kellam, Paul
    Martin, Nigel
    Liu, Xinhui
    Orengo, Christine
    NUCLEIC ACIDS RESEARCH, 2008, 36 : D414 - D418
  • [2] Gene3D: merging structure and function for a Thousand genomes
    Lees, Jonathan
    Yeats, Corin
    Redfern, Oliver
    Clegg, Andrew
    Orengo, Christine
    NUCLEIC ACIDS RESEARCH, 2010, 38 : D296 - D300
  • [3] Gene3D: modelling protein structure, function and evolution
    Yeats, Corin
    Maibaum, Michael
    Marsden, Russell
    Dibley, Mark
    Lee, David
    Addou, Sarah
    Orengo, Christine A.
    NUCLEIC ACIDS RESEARCH, 2006, 34 : D281 - D284
  • [4] Gene3D: Structural assignment for whole genes and genomes using the CATH domain structure database
    Buchan, DWA
    Shepherd, AJ
    Lee, D
    Pearl, FMG
    Rison, SCG
    Thornton, JM
    Orengo, CA
    GENOME RESEARCH, 2002, 12 (03) : 503 - 514
  • [5] Predicting protein function with hierarchical phylogenetic profiles: The Gene3D phylo-tuner method applied to eukaryotic Genomes
    Ranea, Juan A. G.
    Yeats, Corin
    Grant, Alastair
    Orengo, Christine A.
    PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (11) : 2366 - 2378
  • [6] Gene3D: expanding the utility of domain assignments
    Lam, Su Datt
    Dawson, Natalie L.
    Das, Sayoni
    Sillitoe, Ian
    Ashford, Paul
    Lee, David
    Lehtinen, Sonja
    Orengo, Christine A.
    Lees, Jonathan G.
    NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) : D404 - D409
  • [7] Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis
    Lees, Jonathan G.
    Lee, David
    Studer, Romain A.
    Dawson, Natalie L.
    Sillitoe, Ian
    Das, Sayoni
    Yeats, Corin
    Dessailly, Benoit H.
    Rentzsch, Robert
    Orengo, Christine A.
    NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) : D240 - D245
  • [8] Gene3D: structural assignments for the biologist and bioinformaticist alike
    Buchan, DWA
    Rison, SCG
    Bray, JE
    Lee, D
    Pearl, F
    Thornton, JM
    Orengo, CA
    NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 469 - 473
  • [9] The Gene3D Web Services: a platform for identifying, annotating and comparing structural domains in protein sequences
    Yeats, Corin
    Lees, Jonathan
    Carter, Phil
    Sillitoe, Ian
    Orengo, Christine
    NUCLEIC ACIDS RESEARCH, 2011, 39 : W546 - W550
  • [10] Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis
    Lees, Jonathan
    Yeats, Corin
    Perkins, James
    Sillitoe, Ian
    Rentzsch, Robert
    Dessailly, Benoit H.
    Orengo, Christine
    NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D465 - D471