Capturing protein sequence-structure specificity using computational sequence design

被引:5
|
作者
Mach, Paul [1 ]
Koehl, Patrice [2 ]
机构
[1] Univ Calif Davis, Genome Ctr, Dept Appl Math, Davis, CA 95616 USA
[2] Univ Calif Davis, Genome Ctr, Dept Comp Sci, Davis, CA 95616 USA
关键词
computational protein sequence design; protein fold recognition; hidden Markov models; sequence threading; SIDE-CHAIN; FOLD SPACE; STABILITY; EVOLUTION; DATABASE; SEARCH; ENERGY; CORE;
D O I
10.1002/prot.24307
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
It is well known that protein fold recognition can be greatly improved if models for the underlying evolution history of the folds are taken into account. The improvement, however, exists only if such evolutionary information is available. To circumvent this limitation for protein families that only have a small number of representatives in current sequence databases, we follow an alternate approach in which the benefits of including evolutionary information can be recreated by using sequences generated by computational protein design algorithms. We explore this strategy on a large database of protein templates with 1747 members from different protein families. An automated method is used to design sequences for these templates. We use the backbones from the experimental structures as fixed templates, thread sequences on these backbones using a self-consistent mean field approach, and score the fitness of the corresponding models using a semi-empirical physical potential. Sequences designed for one template are translated into a hidden Markov model-based profile. We describe the implementation of this method, the optimization of its parameters, and its performance. When the native sequences of the protein templates were tested against the library of these profiles, the class, fold, and family memberships of a large majority (>90%) of these sequences were correctly recognized for an E-value threshold of 1. In contrast, when homologous sequences were tested against the same library, a much smaller fraction (35%) of sequences were recognized; The structural classification of protein families corresponding to these sequences, however, are correctly recognized (with an accuracy of >88%). Proteins 2013; (c) 2013 Wiley Periodicals, Inc.
引用
收藏
页码:1556 / 1570
页数:15
相关论文
共 50 条
  • [31] Local sequence-structure correlations in proteins
    Bystroff, C
    Simons, KT
    Han, KF
    Baker, D
    CURRENT OPINION IN BIOTECHNOLOGY, 1996, 7 (04) : 417 - 421
  • [32] An Integrated Sequence-Structure Database incorporating matching mRNA sequence, amino acid sequence and protein three-dimensional structure data
    Adzhubei, IA
    Adzhubei, AA
    Neidle, S
    NUCLEIC ACIDS RESEARCH, 1998, 26 (01) : 327 - 331
  • [33] Paramecium: RNA sequence-structure phylogenetics
    Weimer, Marlyn
    Vd'acny, Peter
    Wolf, Matthias
    INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY, 2023, 73 (04)
  • [34] Sequence-structure relationships in yeast mRNAs
    Chursov, Andrey
    Walter, Mathias C.
    Schmidt, Thorsten
    Mironov, Andrei
    Shneider, Alexander
    Frishman, Dmitrij
    NUCLEIC ACIDS RESEARCH, 2012, 40 (03) : 956 - 962
  • [35] Prediction of local structure in proteins using a library of sequence-structure motifs
    Bystroff, C
    Baker, D
    JOURNAL OF MOLECULAR BIOLOGY, 1998, 281 (03) : 565 - 577
  • [36] SEQUENCE-STRUCTURE RELATIONSHIPS IN PROTEINS AND COPOLYMERS
    YUE, KZ
    DILL, KA
    PHYSICAL REVIEW E, 1993, 48 (03): : 2267 - 2278
  • [37] Local sequence-structure relationships in proteins
    Skrbic, Tatjana
    Maritan, Amos
    Giacometti, Achille
    Banavar, Jayanth R.
    PROTEIN SCIENCE, 2021, 30 (04) : 818 - 829
  • [38] Sequence-structure relations of pseudoknot RNA
    Fenix WD Huang
    Linda YM Li
    Christian M Reidys
    BMC Bioinformatics, 10
  • [39] Sequence-structure patterns: Discovery and applications
    Milledge, T
    Khuri, S
    Wei, X
    Yang, C
    Zheng, G
    Narasimhan, G
    Proceedings of the 8th Joint Conference on Information Sciences, Vols 1-3, 2005, : 1282 - 1285
  • [40] Protein sequence-structure space and resultant data redundancy in the Protein Data Bank
    Shindyalov, IN
    Bourne, PE
    METMBS'01: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES, 2001, : 139 - 145