Capturing protein sequence-structure specificity using computational sequence design

被引：5

作者：

Mach, Paul ^{[1
]}

Koehl, Patrice ^{[2
]}

机构：

[1] Univ Calif Davis, Genome Ctr, Dept Appl Math, Davis, CA 95616 USA

[2] Univ Calif Davis, Genome Ctr, Dept Comp Sci, Davis, CA 95616 USA

来源：

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS | 2013年 / 81卷 / 09期

关键词：

computational protein sequence design; protein fold recognition; hidden Markov models; sequence threading; SIDE-CHAIN; FOLD SPACE; STABILITY; EVOLUTION; DATABASE; SEARCH; ENERGY; CORE;

D O I：

10.1002/prot.24307

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

It is well known that protein fold recognition can be greatly improved if models for the underlying evolution history of the folds are taken into account. The improvement, however, exists only if such evolutionary information is available. To circumvent this limitation for protein families that only have a small number of representatives in current sequence databases, we follow an alternate approach in which the benefits of including evolutionary information can be recreated by using sequences generated by computational protein design algorithms. We explore this strategy on a large database of protein templates with 1747 members from different protein families. An automated method is used to design sequences for these templates. We use the backbones from the experimental structures as fixed templates, thread sequences on these backbones using a self-consistent mean field approach, and score the fitness of the corresponding models using a semi-empirical physical potential. Sequences designed for one template are translated into a hidden Markov model-based profile. We describe the implementation of this method, the optimization of its parameters, and its performance. When the native sequences of the protein templates were tested against the library of these profiles, the class, fold, and family memberships of a large majority (>90%) of these sequences were correctly recognized for an E-value threshold of 1. In contrast, when homologous sequences were tested against the same library, a much smaller fraction (35%) of sequences were recognized; The structural classification of protein families corresponding to these sequences, however, are correctly recognized (with an accuracy of >88%). Proteins 2013; (c) 2013 Wiley Periodicals, Inc.

引用

页码：1556 / 1570

页数：15

共 50 条

[41] Classification tree based protein structure distances for testing sequence-structure correlation
Zintzaras, Elias
COMPUTERS IN BIOLOGY AND MEDICINE, 2008, 38 (04) : 469 - 474
[42] Engineering Enzyme Specificity Using Computational Design of a Defined-Sequence Library
Lippow, Shaun M.
Moon, Tae Seok
Basu, Subhayu
Yoon, Sang-Hwal
Li, Xiazhen
Chapman, Brad A.
Robison, Keith
Lipovsek, Data
Prather, Kristala L. J.
CHEMISTRY & BIOLOGY, 2010, 17 (12): : 1306 - 1315
[43] Protein sequence-structure compatibility criteria in terms of statistical hypothesis testing
Sunyaev, S
Kuznetsov, E
Rodchenkov, I
Tumanyan, V
PROTEIN ENGINEERING, 1997, 10 (06): : 635 - 646
[44] Thermodynamic analysis of protein sequence-structure relationships in monomer and dimer forms
Li, ZR
Liu, GR
Cheng, Y
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2005, 354 : 381 - 392
[45] Remote homolog detection using local sequence-structure correlations
Hou, YN
Hsu, W
Lee, ML
Bystroff, C
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 57 (03) : 518 - 530
[46] Rapid search for tertiary fragments reveals protein sequence-structure relationships
Zhou, Jianfu
Grigoryan, Gevorg
PROTEIN SCIENCE, 2015, 24 (04) : 508 - 524
[47] Deciphering globular protein sequence-structure relationships: from observation to prediction
Poupon, A
Mornon, JP
THEORETICAL CHEMISTRY ACCOUNTS, 2001, 106 (1-2) : 113 - 120
[48] A general-purpose protein design framework based on mining sequence-structure relationships in known protein structures
Zhou, Jianfu
Panaitiu, Alexandra E.
Grigoryan, Gevorg
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (02) : 1059 - 1068
[49] Clustering of Protein Substructures for Discovery of a Novel Class of Sequence-Structure Fragments
Rudolfova, Ivana
Zendulka, Jaroslav
Lexa, Matej
INFORMATION TECHNOLOGY IN BIO- AND MEDICAL INFORMATICS, 2010, 6266 : 94 - 101
[50] Nonlinear signal analysis methods in the elucidation of protein sequence-structure relationships
Giuliani, A
Benigni, R
Zbilut, JP
Webber, CL
Sirabella, P
Colosimo, A
CHEMICAL REVIEWS, 2002, 102 (05) : 1471 - 1491

← 1 2 3 4 5 →