SEARCHING FOR REPRESENTATIONS TO IMPROVE PROTEIN-SEQUENCE FOLD-CLASS PREDICTION

被引：0

作者：

IOERGER, TR ^{[1
]}

RENDELL, LA ^{[1
]}

SUBRAMANIAM, S ^{[1
]}

机构：

[1] UNIV ILLINOIS,BECKMAN INST,DEPT PHYSIOL & BIOPHYS,NATL CTR SUPERCOMP APPLICAT,URBANA,IL 61801

来源：

MACHINE LEARNING | 1995年 / 21卷 / 1-2期

关键词：

DOMAIN KNOWLEDGE; CHANGE OF REPRESENTATION; THEORY REVISION; PROTEIN STRUCTURE PREDICTION; HOMOLOGY MODELING; AMINO ACID PROPERTIES;

D O I：

10.1023/A:1022625916438

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Predicting the fold, or approximate 3D structure, of a protein from its amino acid sequence is an important problem in biology. The homology modeling approach uses a protein database to identify fold-class relationships by sequence similarity. The main limitation of this method is that some proteins with similar structures appear to have very different sequences, which we call the ''hidden-homology problem.'' As in other real-world domains for machine learning, this difficulty may be caused by a low-level representation. Learning in such domains can be improved by using domain knowledge to search for representations that better match the inductive bias of a preferred algorithm. In this domain, knowledge of amino acid properties can be used to construct higher-level representations of protein sequences. In one experiment using a 179-protein data set, the accuracy of fold-class prediction was increased from 77.7% to 81.0%. The search results are analyzed to refine the grouping of small residues suggested by Dayhoff. Finally, an extension to the representation incorporates sequential context directly into the representation, which can express finer relationships among the amino acids. The methods developed in this domain are generalized into a framework that suggests several systematic roles for domain knowledge in machine learning. Knowledge may define both a space of alternative representations, as well as a strategy for searching this space. The search results may be summarized to extract feedback for revising the domain knowledge.

引用

页码：151 / 175

页数：25

共 50 条

[1] SEARCHING THE PROTEIN-SEQUENCE DATABASE
ORCUTT, BC
BARKER, WC
BULLETIN OF MATHEMATICAL BIOLOGY, 1984, 46 (04) : 545 - 552
[2] SEARCHING GENE AND PROTEIN-SEQUENCE DATABASES
BARSALOU, T
BRUTLAG, DL
M D COMPUTING, 1991, 8 (03): : 144 - 149
[3] ONLINE PROTEIN-SEQUENCE DATA SEARCHING
SCHWARZWALDER, R
DATABASE, 1991, 14 (05): : 106 - 108
[4] COMPARISON OF METHODS FOR SEARCHING PROTEIN-SEQUENCE DATABASES
PEARSON, WR
PROTEIN SCIENCE, 1995, 4 (06) : 1145 - 1160
[5] AUTOMATED PROTEIN-SEQUENCE PATTERN HANDLING AND PROSITE SEARCHING
SIBBALD, PR
SOMMERFELDT, H
ARGOS, P
COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1991, 7 (04): : 535 - 536
[6] Prediction of Protein Structural Class Using a Combined Representation of Protein-sequence Information and Support Vector Machine
Wu, Li
Dai, Qi
Han, Bin
Zhu, Lei
Li, Lihua
2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2010, : 101 - 106
[7] PATTERN-MATCHING METHODS IN PROTEIN-SEQUENCE COMPARISON AND STRUCTURE PREDICTION
TAYLOR, WR
PROTEIN ENGINEERING, 1988, 2 (02): : 77 - 86
[8] A KNOWLEDGE-BASED ARCHITECTURE FOR PROTEIN-SEQUENCE ANALYSIS AND STRUCTURE PREDICTION
CLARK, DA
BARTON, GJ
RAWLINGS, CJ
JOURNAL OF MOLECULAR GRAPHICS, 1990, 8 (02): : 94 - 107
[9] Support vector machines for protein fold class prediction
Markowetz, F
Edler, L
Vingron, M
BIOMETRICAL JOURNAL, 2003, 45 (03) : 377 - 389
[10] SEARCHING PROTEIN-SEQUENCE LIBRARIES - COMPARISON OF THE SENSITIVITY AND SELECTIVITY OF THE SMITH-WATERMAN AND FASTA ALGORITHMS
PEARSON, WR
GENOMICS, 1991, 11 (03) : 635 - 650

← 1 2 3 4 5 →