Optimisation of HMM Topologies Enhances DNA and Protein Sequence Modelling

被引:1
|
作者
Friedrich, Torben [1 ]
Koetschan, Christian [1 ]
Mueller, Tobias [1 ]
机构
[1] Univ Wurzburg, D-97070 Wurzburg, Germany
关键词
artificial intelligence; biostatistics; hidden Markov models; mathematical modelling; pattern recognition; statistics; HIDDEN MARKOV-MODELS; SECONDARY STRUCTURE; PREDICTION; ITS2; DATABASE;
D O I
10.2202/1544-6115.1480
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Hidden Markov models (HMMs) play a major role in applications to unravel biomolecular functionality. Though HMMs are technically mature and widely applied in computational biology, there is a potential of methodical optimisation concerning its modelling of biological data sources with varying sequence lengths. Single building blocks of these models, the states, are associated with a certain holding time, being the link to the length distribution of represented sequence motifs. An adaptation of regular HMM topologies to bell-shaped sequence lengths is achieved by a serial chain-linking of hidden states, while residing in the class of conventional hidden Markov models. The factor of the repetition of states (r) and the parameter for state-specific duration of stay (p) are determined by fitting the distribution of sequence lengths with the method of moments (MM) and maximum likelihood (ML). Performance evaluations of differently adjusted HMM topologies underline the impact of an optimisation for HMMs based on sequence lengths. Secondary structure prediction on internal transcribed spacer 2 sequences demonstrates exemplarily the general impact of topological optimisations. In summary, we propose a general methodology to improve the modelling behaviour of HMMs by topological optimisation with ML and a fast and easily implementable moment estimator.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Improving the performance of an HMM for protein family modelling
    Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt
    J. Appl. Sci., 2007, 12 (1626-1632):
  • [2] Profile HMM based Multiple Sequence Alignment for DNA Sequences
    Mulia, Sudipta
    Mishra, Debahuti
    Jena, Tanushree
    INTERNATIONAL CONFERENCE ON MODELLING OPTIMIZATION AND COMPUTING, 2012, 38 : 1783 - 1787
  • [3] HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment
    Remmert, Michael
    Biegert, Andreas
    Hauser, Andreas
    Soeding, Johannes
    NATURE METHODS, 2012, 9 (02) : 173 - 175
  • [4] HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment
    Michael Remmert
    Andreas Biegert
    Andreas Hauser
    Johannes Söding
    Nature Methods, 2012, 9 (2) : 173 - 175
  • [5] Protein/DNA interactions in complex DNA topologies: expect the unexpected
    Noy A.
    Sutthibutpong T.
    A. Harris S.
    Biophysical Reviews, 2016, 8 (3) : 233 - 243
  • [6] Protein/DNA interactions in complex DNA topologies: expect the unexpected
    Noy A.
    Sutthibutpong T.
    A. Harris S.
    Biophysical Reviews, 2016, 8 (Suppl 1) : 145 - 155
  • [7] Aligning a DNA sequence with a protein sequence
    Zhang, Z
    Pearson, WR
    Miller, W
    JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (03) : 339 - 349
  • [8] Pfam: multiple sequence alignments and HMM-profiles of protein domains
    Sonnhammer, ELL
    Eddy, SR
    Birney, E
    Bateman, A
    Durbin, R
    NUCLEIC ACIDS RESEARCH, 1998, 26 (01) : 320 - 322
  • [9] Hierarchically Clustered HMM for Protein Sequence Motif Extraction with Variable Length
    Hudson, Cody
    Chen, Bernard
    Che, Dongsheng
    TSINGHUA SCIENCE AND TECHNOLOGY, 2014, 19 (06) : 635 - 647
  • [10] Hierarchically Clustered HMM for Protein Sequence Motif Extraction with Variable Length
    Cody Hudson
    Bernard Chen
    Dongsheng Che
    Tsinghua Science and Technology, 2014, 19 (06) : 635 - 647