Integrating database homology in a probabilistic gene structure model

被引:0
|
作者
Kulp, D
Haussler, D
Reese, MG
Eeckman, FH
机构
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present an improved stochastic model of genes in DNA, and describe a method for integrating database homology into the probabilistic framework. A generalized hidden Markov model (GHMM) describes the grammar of a legal parse of a DNA sequence. Probabilities are estimated for gene features by using dynamic programming to combine information from multiple sensors. We show how matches to homologous sequences from a database can be integrated into the probability estimation by interpreting the likelihood of a sequence in terms of the bit-cost to encode a sequence given a homology match. We also demonstrate how homology matches in protein databases can be exploited to help identify splice sites. Our experiments show significant improvements in the sensitivity and specificity of gene structure identification when these new features are added to our gene-finding system, Genie. Experimental results in tests using a standard set of annotated genes showed that Genie identified 95% of coding nucleotides correctly with a specificity of 91%, and 77% of exons were identified exactly.
引用
收藏
页码:232 / 244
页数:13
相关论文
共 50 条
  • [31] Integrating "Evo" and "Devo": The Limb as Model Structure
    Young, Nathan M.
    INTEGRATIVE AND COMPARATIVE BIOLOGY, 2017, 57 (06) : 1293 - 1302
  • [32] Integrating stochasticity and network structure into an epidemic model
    Dangerfield, C. E.
    Ross, J. V.
    Keeling, M. J.
    JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2009, 6 (38) : 761 - 774
  • [33] An improved probabilistic model for finding differential gene expression
    Zhang, Li
    Liu, Xuejun
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOLS 1-4, 2009, : 1566 - 1571
  • [34] Replica procedure for probabilistic algorithms as a model of gene duplication
    S. V. Kozyrev
    A. Yu. Khrennikov
    Doklady Mathematics, 2011, 84 : 726 - 729
  • [35] Probabilistic spatial database operations
    Ni, JF
    Ravishankar, CV
    Bhanu, B
    ADVANCES IN SPATIAL AND TEMPORAL DATABASES, PROCEEDINGS, 2003, 2750 : 140 - 158
  • [36] GeneNet: a database on structure and functional organisation of gene networks
    Ananko, EA
    Podkolodny, NL
    Stepanenko, IL
    Ignatieva, EV
    Podkolodnaya, OA
    Kolchanov, NA
    NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 398 - 401
  • [37] PIECE: a database for plant gene structure comparison and evolution
    Wang, Yi
    You, Frank M.
    Lazo, Gerard R.
    Luo, Ming-Cheng
    Thilmony, Roger
    Gordon, Sean
    Kianian, Shahryar F.
    Gu, Yong Q.
    NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D1159 - D1166
  • [38] Integrating faults and past earthquakes into a probabilistic seismic hazard model for peninsular Italy
    Valentini, Alessandro
    Visini, Francesco
    Pace, Bruno
    NATURAL HAZARDS AND EARTH SYSTEM SCIENCES, 2017, 17 (11) : 2017 - 2039
  • [39] Integrating climate model projections into environmental risk assessment: A probabilistic modeling approach
    Moe, S. Jannicke
    Brix, Kevin V.
    Landis, Wayne G.
    Stauber, Jenny L.
    Carriger, John F.
    Hader, John D.
    Kunimitsu, Taro
    Mentzel, Sophie
    Nathan, Rory
    Noyes, Pamela D.
    Oldenkamp, Rik
    Rohr, Jason R.
    van den Brink, Paul J.
    Verheyen, Julie
    Benestad, Rasmus E.
    INTEGRATED ENVIRONMENTAL ASSESSMENT AND MANAGEMENT, 2024, 20 (02) : 367 - 383
  • [40] A novel collaborative recommendation algorithm integrating probabilistic matrix factorization and neighbor model
    Yu, Hongtao
    Dou, Lisha
    Zhang, Fuzhi
    Journal of Information and Computational Science, 2015, 12 (05): : 2011 - 2019