RELATIONS BETWEEN PROTEIN-SEQUENCE AND STRUCTURE AND THEIR SIGNIFICANCE

被引:46
|
作者
ROOMAN, MJ
RODRIGUEZ, J
WODAK, SJ
机构
[1] Unité de Conformation des Macromolécules Biologiques Université Libre de Bruxelles, CP160, P2
关键词
D O I
10.1016/S0022-2836(05)80195-0
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The relation between amino acid sequence and local structure in proteins is investigated. The local structures considered are either the four classes of secondary structure (H, E, T and C) or four classes of local conformations defined using measures of conformational similarity based on distances between Cα atoms. The classes are obtained by applying an automatic clustering procedure to short polypeptide fragments of uniform length from a database of 75 known protein structures. The thrust of our investigation consists of systematically searching the database for simple amino acid patterns of the type Gly-X-Ala-X-X-Val, where X denotes an arbitrary residue. Patterns that are nearly always associated with the same structure are retained. Finding many such associations, we then evaluate by a statistical approach how many among them are non-random and compare the results for different definitions of local structure. A similar comparison is made for the predictive value of retained associations, which is assessed using an internal test based on dividing the database into "learning" and "test" subsets. While we find that local structures defined by conformational similarity are not superior to secondary structure for prediction purposes, they help us gain insight into the factors that influence the predictive value of derived associations. A major conclusion is that the number of retained associations is in large excess over the number expected from a random correlation between sequence and structure, irrespective of how local conformation is defined. However, only a very small number of these associations can be earmarked as reliable using statistical criteria, due to the limited size of the database. We find, for instance, that the pattern Ala-Ala-X-X-Lys reliably characterizes helix, and the pattern Val-X-Val-X-X-X-Ala reliably characterizes extended structure and β-strand. The possibility is discussed that these and other reliable associations correspond to regions of the polypeptide chain whose conformations are locally determined and that these regions may play a role in folding. © 1990 Academic Press Limited.
引用
收藏
页码:337 / 350
页数:14
相关论文
共 50 条
  • [1] THE SIGNIFICANCE OF PROTEIN-SEQUENCE SIMILARITIES
    COLLINS, JF
    COULSON, AFW
    LYALL, A
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1988, 4 (01): : 67 - 71
  • [2] SIGNIFICANCE OF PROTEIN-SEQUENCE SIMILARITIES
    COLLINS, JF
    COULSON, AFW
    METHODS IN ENZYMOLOGY, 1990, 183 : 474 - 487
  • [3] PROTEIN-SEQUENCE COMPARISON - METHODS AND SIGNIFICANCE
    ARGOS, P
    VINGRON, M
    VOGT, G
    PROTEIN ENGINEERING, 1991, 4 (04): : 375 - 383
  • [4] A PROTEIN-SEQUENCE STRUCTURE DATABASE
    不详
    NATURE, 1988, 335 (6192) : 745 - 746
  • [5] PROTEIN-SEQUENCE RANDOMNESS AND SEQUENCE STRUCTURE CORRELATIONS
    RAHMAN, RS
    RACKOVSKY, S
    BIOPHYSICAL JOURNAL, 1995, 68 (04) : 1531 - 1539
  • [6] A NOVEL SEARCH METHOD FOR PROTEIN-SEQUENCE STRUCTURE RELATIONS USING PROPERTY PROFILES
    VRIEND, G
    SANDER, C
    STOUTEN, PFW
    PROTEIN ENGINEERING, 1994, 7 (01): : 23 - 29
  • [7] PROTEIN-SEQUENCE DATABASE
    BARKER, WC
    GEORGE, DG
    HUNT, LT
    METHODS IN ENZYMOLOGY, 1990, 183 : 31 - 49
  • [8] PROTEIN-SEQUENCE AND STRUCTURE COMPARISON ON MASSIVELY PARALLEL COMPUTERS
    JONES, R
    INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1992, 6 (02): : 138 - 146
  • [9] PROTEIN-SEQUENCE DATABASE
    DAYHOFF, MO
    HUNT, LT
    BARKER, WC
    ORCUTT, BC
    CHEN, HR
    YEH, LS
    GEORGE, DG
    FEDERATION PROCEEDINGS, 1982, 41 (03) : 498 - 498
  • [10] A PROLOG APPROACH TO INTEGRATING PROTEIN-SEQUENCE AND STRUCTURE DATA
    BARTON, GJ
    RAWLINGS, CJ
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1991, 202 : 30 - CINF