Automatic classification of protein sequences into structure/function groups via parallel cascade identification: A feasibility study

被引：14

作者：

Korenberg, MJ ^{[1
]}

David, R

Hunter, IW

Solomon, JE

机构：

[1] Queens Univ, Dept Elect & Comp Engn, Kingston, ON K7L 3N6, Canada

[2] MIT, Dept Mech Engn, Cambridge, MA 02139 USA

[3] CALTECH, Beckman Inst, Ctr Computat Biol, Pasadena, CA 91125 USA

来源：

ANNALS OF BIOMEDICAL ENGINEERING | 2000年 / 28卷 / 07期

关键词：

protein sequence classification; nonlinear system identification; binary sequences; SARAH codes;

D O I：

10.1114/1.1289470

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

A recent paper introduced the approach of using nonlinear system identification as a means for automatically classifying protein sequences into their structure/function families. The particular technique utilized, known as parallel cascade identification (PCI), could train classifiers on a very limited set of exemplars from the protein families to be distinguished and still achieve impressively good two-way classifications. For the nonlinear system classifiers to have numerical inputs, each amino acid in the protein was mapped into a corresponding hydrophobicity value, and the resulting hydro phobicity profile was used in place of the primary amino acid sequence. While the ensuing classification accuracy was gratifying, the use of (Rose scale) hydrophobicity values had some disadvantages. These included representing multiple amino acids by the same value, weighting some amino acids more heavily than others, and covering a narrow numerical range, resulting in a poor input for system identification. This paper introduces binary and multilevel sequence codes to represent amino acids, for use in protein classification. The new binary and multilevel sequences, which are still able to encode information such as hydrophobicity, polarity, and charge, avoid the above disadvantages and increase classification accuracy. Indeed, over a much larger test set than in the original study, parallel cascade models using numerical profiles constructed with the new codes achieved slightly higher two-way classification rates than did hidden Markov models (HMMs) using the primary amino acid sequences, and combining PCT and HMM approaches increased accuracy. (C) 2000 Biomedical Engineering Society. [S0090-6964(00)00607-X].

引用

页码：803 / 811

页数：9

共 9 条

[1] Automatic Classification of Protein Sequences into Structure/Function Groups via Parallel Cascade Identification: A Feasibility Study
Michael J. Korenberg
Robert David
Ian W. Hunter
Jerry E. Solomon
Annals of Biomedical Engineering, 2000, 28 : 803 - 811
[2] Parallel cascade identification as a means for automatically classifying protein sequences into structure/function groups
Michael Korenberg
Jerry E. Solomon
Moira E. Regelson
Biological Cybernetics, 2000, 82 : 15 - 21
[3] Parallel cascade identification as a means for automatically classifying protein sequences into structure/function groups
Korenberg, M
Solomon, JE
Regelson, ME
BIOLOGICAL CYBERNETICS, 2000, 82 (01) : 15 - 21
[4] Automatic discrimination of myoelectric signals via parallel cascade identification
Korenberg, MJ
Morin, EL
ANNALS OF BIOMEDICAL ENGINEERING, 1997, 25 (04) : 708 - 712
[5] Automatic discrimination of myoelectric signals via parallel cascade identification
Michael J. Korenberg
Evelyn L. Morin
Annals of Biomedical Engineering, 1997, 25 : 708 - 712
[6] Rapid DTMF signal classification via parallel cascade identification
Korenberg, MJ
Doherty, PW
ELECTRONICS LETTERS, 1996, 32 (20) : 1862 - 1863
[7] EVEREST: automatic identification and classification of protein domains in all protein sequences
Portugaly, Elon
Harel, Amir
Linial, Nathan
Linial, Michal
BMC BIOINFORMATICS, 2006, 7 (1)
[8] EVEREST: automatic identification and classification of protein domains in all protein sequences
Elon Portugaly
Amir Harel
Nathan Linial
Michal Linial
BMC Bioinformatics, 7
[9] The preparation of all-trans uniformly 13C-labeled retinal via a modular total organic synthetic strategy.: Emerging central contribution of organic synthesis toward the structure and function study with atomic resolution in protein research
Creemers, AFL
Lugtenburg, J
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2002, 124 (22) : 6324 - 6334

← 1 →