ProtoMap: automatic classification of protein sequences and hierarchy of protein families

被引:108
|
作者
Yona, G
Linial, N
Linial, M
机构
[1] Stanford Univ, Dept Biol Struct, Stanford, CA 94305 USA
[2] Hebrew Univ Jerusalem, Inst Comp Sci, IL-91904 Jerusalem, Israel
[3] Hebrew Univ Jerusalem, Inst Life Sci, Dept Biol Chem, IL-91904 Jerusalem, Israel
关键词
D O I
10.1093/nar/28.1.49
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The ProtoMap site offers an exhaustive classification of all proteins in the SWISS-PROT database, into groups of related proteins. The classification is based on analysis of all pairwise similarities among protein sequences, The analysis makes essential use of transitivity to identify homologies among proteins. Within each group of the classification, every two members are either directly or transitively related. However, transitivity is applied restrictively in order to prevent unrelated proteins from clustering together, The classification is done at different levels of confidence, and yields a hierarchical organization of all:proteins. The resulting classification splits the protein space into well-defined groups of proteins, which are closely correlated with natural biological families and superfamilies. Many clusters contain protein sequences that are not classified by other databases. The hierarchical organization suggested by our analysis may help in detecting finer subfamilies in families of known proteins. In addition it brings forth interesting relationships between protein families, upon which local maps for the neighborhood of protein families can be sketched. The ProtoMap web server can be accessed at http://www.protomap.cs.huji.ac.il.
引用
收藏
页码:49 / 55
页数:7
相关论文
共 50 条
  • [1] ProtoMap: Automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space
    Yona, G
    Linial, N
    Linial, M
    PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1999, 37 (03): : 360 - 378
  • [2] EVEREST: automatic identification and classification of protein domains in all protein sequences
    Portugaly, Elon
    Harel, Amir
    Linial, Nathan
    Linial, Michal
    BMC BIOINFORMATICS, 2006, 7 (1)
  • [3] EVEREST: automatic identification and classification of protein domains in all protein sequences
    Elon Portugaly
    Amir Harel
    Nathan Linial
    Michal Linial
    BMC Bioinformatics, 7
  • [4] AutoPSI: a database for automatic structural classification of protein sequences and structures
    Birzele, Fabian
    Gewehr, Jan E.
    Zimmer, Ralf
    NUCLEIC ACIDS RESEARCH, 2008, 36 : D398 - D401
  • [5] Towards automatic clustering of protein sequences
    Yang, J
    Wang, W
    CSB2002: IEEE COMPUTER SOCIETY BIOINFORMATICS CONFERENCE, 2002, : 175 - 186
  • [6] Classification of Protein Sequences by a Novel Alignment-Free Method on Bacterial and Virus Families
    Guan, Mengcen
    Zhao, Leqi
    Yau, Stephen S-T
    GENES, 2022, 13 (10)
  • [7] IDENTIFICATION AND CLASSIFICATION OF PROTEIN FOLD FAMILIES
    ORENGO, CA
    FLORES, TP
    TAYLOR, WR
    THORNTON, JM
    PROTEIN ENGINEERING, 1993, 6 (05): : 485 - 500
  • [8] PoET: A generative model of protein families as sequences-of-sequences
    Truong, Timothy F., Jr.
    Bepler, Tristan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Size dependent complexity of sequences in protein families
    Li, J
    Wang, J
    Wang, W
    EUROPEAN PHYSICAL JOURNAL B, 2005, 47 (03): : 431 - 436
  • [10] Size dependent complexity of sequences in protein families
    J. Li
    J. Wang
    W. Wang
    The European Physical Journal B - Condensed Matter and Complex Systems, 2005, 47 : 431 - 436