Optimisation of HMM Topologies Enhances DNA and Protein Sequence Modelling

被引:1
|
作者
Friedrich, Torben [1 ]
Koetschan, Christian [1 ]
Mueller, Tobias [1 ]
机构
[1] Univ Wurzburg, D-97070 Wurzburg, Germany
关键词
artificial intelligence; biostatistics; hidden Markov models; mathematical modelling; pattern recognition; statistics; HIDDEN MARKOV-MODELS; SECONDARY STRUCTURE; PREDICTION; ITS2; DATABASE;
D O I
10.2202/1544-6115.1480
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Hidden Markov models (HMMs) play a major role in applications to unravel biomolecular functionality. Though HMMs are technically mature and widely applied in computational biology, there is a potential of methodical optimisation concerning its modelling of biological data sources with varying sequence lengths. Single building blocks of these models, the states, are associated with a certain holding time, being the link to the length distribution of represented sequence motifs. An adaptation of regular HMM topologies to bell-shaped sequence lengths is achieved by a serial chain-linking of hidden states, while residing in the class of conventional hidden Markov models. The factor of the repetition of states (r) and the parameter for state-specific duration of stay (p) are determined by fitting the distribution of sequence lengths with the method of moments (MM) and maximum likelihood (ML). Performance evaluations of differently adjusted HMM topologies underline the impact of an optimisation for HMMs based on sequence lengths. Secondary structure prediction on internal transcribed spacer 2 sequences demonstrates exemplarily the general impact of topological optimisations. In summary, we propose a general methodology to improve the modelling behaviour of HMMs by topological optimisation with ML and a fast and easily implementable moment estimator.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] Ant Colony Optimisation with Unified Nearest-Neighbour Thermodynamic Parameter for DNA Sequence Design in DNA Computing
    Ibrahim, Zuwairie
    Jusof, Mohd Falfazli Mat
    Tumari, Mohd Zaidi Mohd
    2014 INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE ISCMI 2014, 2014, : 20 - 23
  • [32] Ant Colony Optimisation with Breslauer Nearest-Neighbour Thermodynamic Parameter for DNA Sequence Design in DNA Computing
    Ibrahim, Zuwairie
    Jusof, Mohd Falfazli Mat
    Tumari, Mohd Zaidi Mohd
    ASIA MODELLING SYMPOSIUM 2014 (AMS 2014), 2014, : 5 - 9
  • [33] A new approach for hmm based protein sequence family modeling and its application to remote homology classification
    Ploetz, Thomas
    Fink, Gernot A.
    2005 IEEE/SP 13th Workshop on Statistical Signal Processing (SSP), Vols 1 and 2, 2005, : 941 - 945
  • [34] Misleading local sequence alignments: implications for comparative protein modelling
    Saqi, MAS
    Russell, RB
    Sternberg, MJE
    PROTEIN ENGINEERING, 1998, 11 (08): : 627 - 630
  • [35] Database indexing for large DNA and protein sequence collections
    Hunt, E
    Atkinson, MP
    Irving, RW
    VLDB JOURNAL, 2002, 11 (03): : 256 - 271
  • [36] DNAA PROTEIN DNA INTERACTION - MODULATION OF THE RECOGNITION SEQUENCE
    SCHAEFER, C
    MESSER, W
    MOLECULAR & GENERAL GENETICS, 1991, 226 (1-2): : 34 - 40
  • [37] SEQUENCE-SPECIFIC INTERACTION OF DNA AND CHROMOSOMAL PROTEIN
    BEKHOR, I
    KUNG, GM
    BONNER, J
    JOURNAL OF MOLECULAR BIOLOGY, 1969, 39 (02) : 351 - &
  • [38] MEME: discovering and analyzing DNA and protein sequence motifs
    Bailey, Timothy L.
    Williams, Nadya
    Misleh, Chris
    Li, Wilfred W.
    NUCLEIC ACIDS RESEARCH, 2006, 34 : W369 - W373
  • [39] A designed curved DNA sequence remarkably enhances transgene expression from plasmid DNA in mouse liver
    S Fukunaga
    G Kanda
    J Tanase
    H Harashima
    T Ohyama
    H Kamiya
    Gene Therapy, 2012, 19 : 828 - 835
  • [40] DESIGN AND SYNTHESIS OF A SEQUENCE SPECIFIC DNA CLEAVING PROTEIN
    MACK, DP
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1988, 196 : 237 - ORGN