SEARCHING FOR REPRESENTATIONS TO IMPROVE PROTEIN-SEQUENCE FOLD-CLASS PREDICTION

被引:0
|
作者
IOERGER, TR [1 ]
RENDELL, LA [1 ]
SUBRAMANIAM, S [1 ]
机构
[1] UNIV ILLINOIS,BECKMAN INST,DEPT PHYSIOL & BIOPHYS,NATL CTR SUPERCOMP APPLICAT,URBANA,IL 61801
关键词
DOMAIN KNOWLEDGE; CHANGE OF REPRESENTATION; THEORY REVISION; PROTEIN STRUCTURE PREDICTION; HOMOLOGY MODELING; AMINO ACID PROPERTIES;
D O I
10.1023/A:1022625916438
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting the fold, or approximate 3D structure, of a protein from its amino acid sequence is an important problem in biology. The homology modeling approach uses a protein database to identify fold-class relationships by sequence similarity. The main limitation of this method is that some proteins with similar structures appear to have very different sequences, which we call the ''hidden-homology problem.'' As in other real-world domains for machine learning, this difficulty may be caused by a low-level representation. Learning in such domains can be improved by using domain knowledge to search for representations that better match the inductive bias of a preferred algorithm. In this domain, knowledge of amino acid properties can be used to construct higher-level representations of protein sequences. In one experiment using a 179-protein data set, the accuracy of fold-class prediction was increased from 77.7% to 81.0%. The search results are analyzed to refine the grouping of small residues suggested by Dayhoff. Finally, an extension to the representation incorporates sequential context directly into the representation, which can express finer relationships among the amino acids. The methods developed in this domain are generalized into a framework that suggests several systematic roles for domain knowledge in machine learning. Knowledge may define both a space of alternative representations, as well as a strategy for searching this space. The search results may be summarized to extract feedback for revising the domain knowledge.
引用
收藏
页码:151 / 175
页数:25
相关论文
共 50 条
  • [1] SEARCHING THE PROTEIN-SEQUENCE DATABASE
    ORCUTT, BC
    BARKER, WC
    BULLETIN OF MATHEMATICAL BIOLOGY, 1984, 46 (04) : 545 - 552
  • [2] SEARCHING GENE AND PROTEIN-SEQUENCE DATABASES
    BARSALOU, T
    BRUTLAG, DL
    M D COMPUTING, 1991, 8 (03): : 144 - 149
  • [3] ONLINE PROTEIN-SEQUENCE DATA SEARCHING
    SCHWARZWALDER, R
    DATABASE, 1991, 14 (05): : 106 - 108
  • [4] COMPARISON OF METHODS FOR SEARCHING PROTEIN-SEQUENCE DATABASES
    PEARSON, WR
    PROTEIN SCIENCE, 1995, 4 (06) : 1145 - 1160
  • [5] AUTOMATED PROTEIN-SEQUENCE PATTERN HANDLING AND PROSITE SEARCHING
    SIBBALD, PR
    SOMMERFELDT, H
    ARGOS, P
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1991, 7 (04): : 535 - 536
  • [6] Prediction of Protein Structural Class Using a Combined Representation of Protein-sequence Information and Support Vector Machine
    Wu, Li
    Dai, Qi
    Han, Bin
    Zhu, Lei
    Li, Lihua
    2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2010, : 101 - 106
  • [7] PATTERN-MATCHING METHODS IN PROTEIN-SEQUENCE COMPARISON AND STRUCTURE PREDICTION
    TAYLOR, WR
    PROTEIN ENGINEERING, 1988, 2 (02): : 77 - 86
  • [8] A KNOWLEDGE-BASED ARCHITECTURE FOR PROTEIN-SEQUENCE ANALYSIS AND STRUCTURE PREDICTION
    CLARK, DA
    BARTON, GJ
    RAWLINGS, CJ
    JOURNAL OF MOLECULAR GRAPHICS, 1990, 8 (02): : 94 - 107
  • [9] Support vector machines for protein fold class prediction
    Markowetz, F
    Edler, L
    Vingron, M
    BIOMETRICAL JOURNAL, 2003, 45 (03) : 377 - 389
  • [10] SEARCHING PROTEIN-SEQUENCE LIBRARIES - COMPARISON OF THE SENSITIVITY AND SELECTIVITY OF THE SMITH-WATERMAN AND FASTA ALGORITHMS
    PEARSON, WR
    GENOMICS, 1991, 11 (03) : 635 - 650