The Distribution of Short Word Match Counts between Markovian Sequences

被引:0
|
作者
Burden, Conrad J. [1 ]
Leopardi, Paul [1 ]
Foret, Sylvain [2 ]
机构
[1] Australian Natl Univ, Math Sci Inst, GPO Box 4, Canberra, ACT 0200, Australia
[2] Australian Natl Univ, Res Sch Biol, Canberra, ACT 0200, Australia
基金
澳大利亚研究理事会;
关键词
Word Matches; Biological Sequence Comparison;
D O I
暂无
中图分类号
R-058 [];
学科分类号
摘要
The D-2 statistic, which counts the number of word matches between two given sequences, has long been proposed as a measure of similarity for biological sequences. Much of the mathematically rigorous work carried out to date on the properties of the D-2 statistic has been restricted to the case of 'Bernoulli' sequences composed of identically and independently distributed letters. Here the properties of the distribution of this statistic for the biologically more realistic case of Markovian sequences is studied. The approach is novel in that Markovian dependency is defined for sequences with periodic boundary conditions, and this enables exact analytic formulae for the mean and variance to be derived. The formulae are confirmed using numerical simulations, and asymptotic approximations to the full distribution are tested.
引用
收藏
页码:25 / 33
页数:9
相关论文
共 50 条
  • [1] Word Match Counts Between Markovian Biological Sequences
    Burden, Conrad
    Leopardi, Paul
    Foret, Sylvain
    [J]. BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES (BIOSTEC 2013), 2014, 452 : 147 - 161
  • [2] The Distribution of Word Matches Between Markovian Sequences with Periodic Boundary Conditions
    Burden, Conrad J.
    Leopardi, Paul
    Foret, Sylvain
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2014, 21 (01) : 41 - 63
  • [3] Exact distribution of word counts in shuffled sequences
    Rodland, EA
    [J]. ADVANCES IN APPLIED PROBABILITY, 2006, 38 (01) : 116 - 133
  • [4] MODERATE DEVIATIONS FOR WORD COUNTS IN BIOLOGICAL SEQUENCES
    Behrens, Sarah
    Loewe, Matthias
    [J]. JOURNAL OF APPLIED PROBABILITY, 2009, 46 (04) : 1020 - 1037
  • [5] Match between word and image
    Marcolin, Marina
    [J]. ALABE-REVISTA DE INVESTIGACION SOBRE LECTURA Y ESCRITURA, 2012, (06):
  • [7] Omnibus Sequences, Coupon Collection, and Missing Word Counts
    Abraham, Sunil
    Brockman, Greg
    Sapp, Stephanie
    Godbole, Anant P.
    [J]. METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, 2013, 15 (02) : 363 - 378
  • [8] Omnibus Sequences, Coupon Collection, and Missing Word Counts
    Sunil Abraham
    Greg Brockman
    Stephanie Sapp
    Anant P. Godbole
    [J]. Methodology and Computing in Applied Probability, 2013, 15 : 363 - 378
  • [9] ON THE STATIONARY DISTRIBUTION OF SOME EXTREMAL MARKOVIAN SEQUENCES
    ALPUIM, MT
    ATHAYDE, E
    [J]. JOURNAL OF APPLIED PROBABILITY, 1990, 27 (02) : 291 - 302
  • [10] Exact distribution of the local score for markovian sequences
    Hassenforder, Claudie
    Mercier, Sabine
    [J]. ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2007, 59 (04) : 741 - 755