Markovian structures in biological sequence alignments

被引：50

作者：

Liu, JS ^{[1
]}

Neuwald, AF

Lawrence, CE

机构：

[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA

[2] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA

[3] New York State Dept Hlth, Wadsworth Ctr Labs & Res, Biometr Lab, Albany, NY 12201 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 1999年 / 94卷 / 445期

关键词：

DNA sequence; evolution; Gibbs sampler; GTPase; hidden Markov model; MAP criterion; model selection; protein sequence; sequence comparisons;

D O I：

10.2307/2669673

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

The alignment of multiple homologous biopolymer sequences is crucial in research on protein modeling and engineering, molecular evolution, and prediction in terms of both gene function and gene product structure. In this article we provide a coherent view of the two recent models used for multiple sequence alignment-the hidden Markov model (HMM) and the block-based motif model-to develop a set of new algorithms that have both the sensitivity of the block-based model and the flexibility of the HMM. In particular, we decompose the standard HMM into two components: the insertion component, which is captured by the so-called "propagation model," and the deletion component, which is described by a deletion vector. Such a decomposition serves as a basis for rational compromise between biological specificity and model flexibility. Furthermore, we introduce a Bayesian model selection criterion that-in combination with the propagation model, genetic algorithm, and other computational aspects-forms the core of PROBE, a multiple alignment and database search methodology. The application of our method to a GTPase family of protein sequences yields an alignment that is confirmed by comparison with known tertiary structures.

引用

页码：1 / 15

页数：15

共 50 条

[31] Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures
Saito, Yutaka
Sato, Kengo
Sakakibara, Yasubumi
BMC BIOINFORMATICS, 2011, 12
[32] Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures
Yutaka Saito
Kengo Sato
Yasubumi Sakakibara
BMC Bioinformatics, 12
[33] AN EXTREMAL MARKOVIAN SEQUENCE
ALPUIM, MT
JOURNAL OF APPLIED PROBABILITY, 1989, 26 (02) : 219 - 232
[34] A kinase sequence database: sequence alignments and family assignment
Buzko, O
Shokat, KM
BIOINFORMATICS, 2002, 18 (09) : 1274 - 1275
[35] Using CLUSTAL for multiple sequence alignments
Higgins, DG
Thompson, JD
Gibson, TJ
COMPUTER METHODS FOR MACROMOLECULAR SEQUENCE ANALYSIS, 1996, 266 : 383 - 402
[36] ESTIMATING STATISTICAL SIGNIFICANCE OF SEQUENCE ALIGNMENTS
WATERMAN, M
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 1994, 344 (1310) : 383 - 390
[37] Domain identification by clustering sequence alignments
Guan, XJ
Du, L
BIOINFORMATICS, 1998, 14 (09) : 783 - 788
[38] Assessing the Discordance of Multiple Sequence Alignments
Prakash, Amol
Tompa, Martin
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2009, 6 (04) : 542 - 551
[39] ALMA, AN EDITOR FOR LARGE SEQUENCE ALIGNMENTS
THIRUP, S
LARSEN, NE
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1990, 7 (03): : 291 - 295
[40] ESTIMATION AND RELIABILITY OF MOLECULAR SEQUENCE ALIGNMENTS
THORNE, JL
CHURCHILL, GA
BIOMETRICS, 1995, 51 (01) : 100 - 113

← 1 2 3 4 5 →