Integrating database homology in a probabilistic gene structure model

被引:0
|
作者
Kulp, D
Haussler, D
Reese, MG
Eeckman, FH
机构
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present an improved stochastic model of genes in DNA, and describe a method for integrating database homology into the probabilistic framework. A generalized hidden Markov model (GHMM) describes the grammar of a legal parse of a DNA sequence. Probabilities are estimated for gene features by using dynamic programming to combine information from multiple sensors. We show how matches to homologous sequences from a database can be integrated into the probability estimation by interpreting the likelihood of a sequence in terms of the bit-cost to encode a sequence given a homology match. We also demonstrate how homology matches in protein databases can be exploited to help identify splice sites. Our experiments show significant improvements in the sensitivity and specificity of gene structure identification when these new features are added to our gene-finding system, Genie. Experimental results in tests using a standard set of annotated genes showed that Genie identified 95% of coding nucleotides correctly with a specificity of 91%, and 77% of exons were identified exactly.
引用
收藏
页码:232 / 244
页数:13
相关论文
共 50 条
  • [1] A relational database model and algebra integrating fuzzy attributes and probabilistic tuples
    Cao, T. H.
    FUZZY SETS AND SYSTEMS, 2022, 445 : 123 - 146
  • [2] A Fuzzy Probabilistic Relational Database Model and Algebra
    Yan, Li
    Ma, Z. M.
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2013, 15 (02) : 244 - 253
  • [3] Integrating probabilistic and deterministic modelling, towards a 'causal' model?
    Stoop, JA
    PSAM 5: PROBABILISTIC SAFETY ASSESSMENT AND MANAGEMENT, VOLS 1-4, 2000, (34): : 2727 - 2732
  • [4] Probabilistic electricity price forecasting by integrating interpretable model
    Jiang, He
    Dong, Yawei
    Dong, Yao
    Wang, Jianzhou
    TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2025, 210
  • [5] A Temporal-Probabilistic Database Model for Information Extraction
    Dylla, Maximilian
    Miliaraki, Iris
    Theobald, Martin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (14): : 1810 - 1821
  • [6] MSblender: A Probabilistic Approach for Integrating Peptide Identifications from Multiple Database Search Engines
    Kwon, Taejoon
    Choi, Hyungwon
    Vogel, Christine
    Nesvizhskii, Alexey I.
    Marcotte, Edward M.
    JOURNAL OF PROTEOME RESEARCH, 2011, 10 (07) : 2949 - 2958
  • [7] A database model for integrating and facilitating collaborative ethnomedicinal research
    Thomas, MB
    Lin, N
    Beck, HH
    PHARMACEUTICAL BIOLOGY, 2001, 39 : 41 - 52
  • [8] Lacritin homology, ECM binding and gene structure.
    Laurie, GW
    Sanghi, S
    Kumar, R
    Huebner, A
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2001, 42 (04) : S260 - S260
  • [9] Homology: Integrating Phylogeny and Development
    Ereshefsky M.
    Biological Theory, 2009, 4 (3) : 225 - 229
  • [10] Horizon scanning in policy research database with a probabilistic topic model
    Kim, Hyunuk
    Ahn, Sang-Jin
    Jung, Woo-Sung
    TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2019, 146 : 588 - 594