MetaFam: a unified classification of protein families. I. Overview and statistics

被引:15
|
作者
Silverstein, KAT [1 ]
Shoop, E [1 ]
Johnson, JE [1 ]
Retzel, EF [1 ]
机构
[1] Univ Minnesota, Acad Hlth Ctr, Comp Biol Ctr, Minneapolis, MN 55455 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/17.3.249
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Protein sequence classification is becoming an increasingly important means of organizing the voluminous data produced by large-scale genome sequencing projects. At present, there are several independent classification methods. To aid the general classification effort, we have created a unified protein family resource, MetaFam. MetaFam is a protein family classification built upon 10 publicly-accessible protein family databases (Blocks+, DOMO, Pfam, PIR-ALN, PRINTS, PROSITE, ProDom, PROTOMAP, SBASE, and SYSTERS). Metafam's family 'supersets', as we call them, are created automatically using set-theory to compare families among the databases. Families of one database are matched to those in another when the intersection of their members exceeds all other possible family pairings between the two databases. Pairwise family matches are drawn together transitively to create a new list of protein family supersets. Results: MetaFam family supersets have several useful features: (1) each superset contains more members than the families from which it is composed, because each of the component family databases only works with a subset of our full non-redundant set of proteins; (2) conflicting assignments can be pinpointed quickly, since our analysis identifies individual members that are in conflict with the majority consensus; (3) family descriptions that are absent from automated databases can frequently be assigned; (4) statistics have been computed comparing domain boundaries, family size distributions, and overall quality of MetaFam supersets; (5) the supersets have been loaded into a relational database to allow for complex queries and visualization of the connections among families in a superset and the consensus of individual domain members; and (6) the quality of individual supersets has been assessed using numerous quantitative measures such as family consistency, connectedness, and size. We anticipate this new resource will be particularly useful to genomic database curators.
引用
收藏
页码:249 / 261
页数:13
相关论文
共 13 条
  • [1] MetaFam: a unified classification of protein families. II. Schema and query capabilities
    Shoop, E
    Silverstein, KAT
    Johnson, JE
    Retzel, EF
    BIOINFORMATICS, 2001, 17 (03) : 262 - 271
  • [2] Beetles (Coleoptera) of Peru: A Survey of the Families. Part I. Overview
    Chaboo, Caroline S.
    JOURNAL OF THE KANSAS ENTOMOLOGICAL SOCIETY, 2015, 88 (02) : 135 - 139
  • [3] The GROND gamma-ray burst sample: I. Overview and statistics
    Greiner, J.
    Krühler, T.
    Bolmer, J.
    Klose, S.
    Afonso, P.M.J.
    Elliott, J.
    Filgas, R.
    Graham, J.F.
    Kann, D.A.
    Knust, F.
    Küpcü Yoldaş, A.
    Nardini, M.
    Nicuesa Guelbenzu, A.M.
    Olivares Estay, F.
    Rossi, A.
    Schady, P.
    Schweyer, T.
    Sudilovsky, V.
    Varela, K.
    Wiseman, P.
    Astronomy and Astrophysics, 2024, 691
  • [4] Classification of the flea families (Siphonaptera): I. family Hystrichopsyllidae (part 5)
    Medvedev S.G.
    Entomological Review, 2010, 90 (2) : 203 - 217
  • [5] Classification of flea families (Siphonaptera): I. Family Hystrichopsyllidae (first part)
    Medvedev S.G.
    Entomological Review, 2006, 86 (4) : 460 - 477
  • [6] Conservation of intrinsic disorder in protein domains and families: I. A database of conserved predicted disordered regions
    Chen, JW
    Romero, P
    Uversky, VN
    Dunker, AK
    JOURNAL OF PROTEOME RESEARCH, 2006, 5 (04) : 879 - 887
  • [7] Examinations on male homosexuals and their clique with special consideration of the issue of the connections between homosexuality and psychosis. I. Chapter. Guinea pigs and their close families.
    Lang, T
    ZEITSCHRIFT FUR DIE GESAMTE NEUROLOGIE UND PSYCHIATRIE, 1941, 171 (05): : 651 - 679
  • [8] Reassessment of the classification of the Ophiuroidea (Echinodermata), based on morphological characters. I. General character evaluation and delineation of the families Ophiomyxidae and Ophiacanthidae
    Martynov, Alexander
    ZOOTAXA, 2010, (2697) : 1 - 154
  • [9] Texture classification based on comparison of second-order statistics. I. Two-point probability density function estimation and distance measure
    Goon, AA
    Rolland, JP
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1999, 16 (07) : 1566 - 1574
  • [10] Texture classification based on comparison of second-order statistics. I. Two-point probability density function estimation and distance measure
    Goon, Alexei A.
    Rolland, Jannick P.
    Journal of the Optical Society of America A: Optics and Image Science, and Vision, 1999, 16 (07): : 1566 - 1574