Integrating database homology in a probabilistic gene structure model

被引:0
|
作者
Kulp, D
Haussler, D
Reese, MG
Eeckman, FH
机构
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present an improved stochastic model of genes in DNA, and describe a method for integrating database homology into the probabilistic framework. A generalized hidden Markov model (GHMM) describes the grammar of a legal parse of a DNA sequence. Probabilities are estimated for gene features by using dynamic programming to combine information from multiple sensors. We show how matches to homologous sequences from a database can be integrated into the probability estimation by interpreting the likelihood of a sequence in terms of the bit-cost to encode a sequence given a homology match. We also demonstrate how homology matches in protein databases can be exploited to help identify splice sites. Our experiments show significant improvements in the sensitivity and specificity of gene structure identification when these new features are added to our gene-finding system, Genie. Experimental results in tests using a standard set of annotated genes showed that Genie identified 95% of coding nucleotides correctly with a specificity of 91%, and 77% of exons were identified exactly.
引用
收藏
页码:232 / 244
页数:13
相关论文
共 50 条
  • [21] Integrating a learned probabilistic model with energy functional for ultrasound image segmentation
    Fang, Lingling
    Zhang, Lirong
    Yao, Yibo
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2021, 59 (09) : 1917 - 1931
  • [22] The ITS2 database II:: homology modelling RNA structure for molecular systematics
    Selig, Christian
    Wolf, Matthias
    Mueller, Tobias
    Dandekar, Thomas
    Schultz, Joerg
    NUCLEIC ACIDS RESEARCH, 2008, 36 : D377 - D380
  • [23] DATABASE SEARCHING WITH WEIGHTED HOMOLOGY PROFILE
    TOH, H
    PROTEIN ENGINEERING, 1994, 7 (09): : 1162 - 1162
  • [24] Integrating the knowledge on gene regulation by a federated database approach: TRANSFAC, TRRD, and COMPEL
    Karas, H
    Kel', AE
    Kel', OV
    Kolchanov, NA
    Wingender, E
    MOLECULAR BIOLOGY, 1997, 31 (04) : 531 - 539
  • [25] A generative, probabilistic model of local protein structure
    Boomsma, Wouter
    Mardia, Kanti V.
    Taylor, Charles C.
    Ferkinghoff-Borg, Jesper
    Krogh, Anders
    Hamelryck, Thomas
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (26) : 8932 - 8937
  • [26] A priori probabilistic model for the reliability of an "organised structure"
    Sal'kov, E. A.
    Svechnikov, G. S.
    SEMICONDUCTOR PHYSICS QUANTUM ELECTRONICS & OPTOELECTRONICS, 2005, 8 (03) : 100 - 105
  • [27] HUMAN FAMSD-BASE: High Quality Protein Structure Model Database for the Human Genome Using the FAMSD Homology Modeling Method
    Kanou, Kazuhiko
    Hirata, Tomoko
    Iwadate, Mitsuo
    Terashi, Genki
    Umeyama, Hideaki
    Takeda-Shitaka, Mayuko
    CHEMICAL & PHARMACEUTICAL BULLETIN, 2010, 58 (01) : 66 - 75
  • [28] Sequence homology search based on database indexing using the Profile Hidden Markov Model
    Xue, Qiang
    Cole, James
    Pramanik, Sakti
    BIBE 2006: SIXTH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, PROCEEDINGS, 2006, : 135 - +
  • [29] Probabilistic validation of homology computations for nodal domains
    Mischaikow, Konstantin
    Wanner, Thomas
    ANNALS OF APPLIED PROBABILITY, 2007, 17 (03): : 980 - 1018
  • [30] Replica Procedure for Probabilistic Algorithms as a Model of Gene Duplication
    Kozyrev, S. V.
    Khrennikov, A. Yu.
    DOKLADY MATHEMATICS, 2011, 84 (02) : 726 - 729