Markovian structures in biological sequence alignments

被引:50
|
作者
Liu, JS [1 ]
Neuwald, AF
Lawrence, CE
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[3] New York State Dept Hlth, Wadsworth Ctr Labs & Res, Biometr Lab, Albany, NY 12201 USA
关键词
DNA sequence; evolution; Gibbs sampler; GTPase; hidden Markov model; MAP criterion; model selection; protein sequence; sequence comparisons;
D O I
10.2307/2669673
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The alignment of multiple homologous biopolymer sequences is crucial in research on protein modeling and engineering, molecular evolution, and prediction in terms of both gene function and gene product structure. In this article we provide a coherent view of the two recent models used for multiple sequence alignment-the hidden Markov model (HMM) and the block-based motif model-to develop a set of new algorithms that have both the sensitivity of the block-based model and the flexibility of the HMM. In particular, we decompose the standard HMM into two components: the insertion component, which is captured by the so-called "propagation model," and the deletion component, which is described by a deletion vector. Such a decomposition serves as a basis for rational compromise between biological specificity and model flexibility. Furthermore, we introduce a Bayesian model selection criterion that-in combination with the propagation model, genetic algorithm, and other computational aspects-forms the core of PROBE, a multiple alignment and database search methodology. The application of our method to a GTPase family of protein sequences yields an alignment that is confirmed by comparison with known tertiary structures.
引用
下载
收藏
页码:1 / 15
页数:15
相关论文
共 50 条
  • [21] ADVANTAGES OF USING MULTIPLE SEQUENCE ALIGNMENTS OVER PAIRWISE ALIGNMENTS WHEN SEQUENCE SIMILARITY IS LOW
    BABBITT, PC
    DUNAWAYMARIANO, D
    KENYON, GL
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1992, 203 : 60 - BIOL
  • [22] Hierarchical folding of multiple sequence alignments for the prediction of structures and RNA-RNA interactions
    Stefan E Seemann
    Andreas S Richter
    Jan Gorodkin
    Rolf Backofen
    Algorithms for Molecular Biology, 5
  • [23] libcov: A C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny
    Butt, D
    Roger, AJ
    Blouin, C
    BMC BIOINFORMATICS, 2005, 6 (1)
  • [24] 3DCoffee: Combining protein sequences and structures within multiple sequence alignments
    O'Sullivan, O
    Suhre, K
    Abergel, C
    Higgins, DG
    Notredame, C
    JOURNAL OF MOLECULAR BIOLOGY, 2004, 340 (02) : 385 - 395
  • [25] Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions
    Torarinsson, Elfar
    Yao, Zizhen
    Wiklund, Eric D.
    Bramsen, Jesper B.
    Hansen, Claus
    Kjems, Jorgen
    Tommerup, Niels
    Ruzzo, Walter L.
    Gorodkin, Jan
    GENOME RESEARCH, 2008, 18 (02) : 242 - 251
  • [26] Exploring the propensities of helices in PrPc to form β sheet using NMR structures and sequence alignments
    Dima, RI
    Thirumalai, D
    BIOPHYSICAL JOURNAL, 2002, 83 (03) : 1268 - 1280
  • [27] Hierarchical folding of multiple sequence alignments for the prediction of structures and RNA-RNA interactions
    Seemann, Stefan E.
    Richter, Andreas S.
    Gorodkin, Jan
    Backofen, Rolf
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2010, 5
  • [28] libcov: A C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny
    Davin Butt
    Andrew J Roger
    Christian Blouin
    BMC Bioinformatics, 6
  • [29] ADVANTAGES OF USING MULTIPLE SEQUENCE ALIGNMENTS OVER PAIRWISE ALIGNMENTS WHEN SEQUENCE SIMILARITY IS LOW
    BABBITT, PC
    DUNAWAYMARIANO, D
    KENYON, GL
    BIOCHEMISTRY, 1992, 31 (07) : 2198 - 2198
  • [30] COMBOSA3D: combining sequence alignments with three-dimensional structures
    Stothard, PM
    BIOINFORMATICS, 2001, 17 (02) : 198 - 199