The Distribution of Short Word Match Counts between Markovian Sequences

被引:0
|
作者
Burden, Conrad J. [1 ]
Leopardi, Paul [1 ]
Foret, Sylvain [2 ]
机构
[1] Australian Natl Univ, Math Sci Inst, GPO Box 4, Canberra, ACT 0200, Australia
[2] Australian Natl Univ, Res Sch Biol, Canberra, ACT 0200, Australia
基金
澳大利亚研究理事会;
关键词
Word Matches; Biological Sequence Comparison;
D O I
暂无
中图分类号
R-058 [];
学科分类号
摘要
The D-2 statistic, which counts the number of word matches between two given sequences, has long been proposed as a measure of similarity for biological sequences. Much of the mathematically rigorous work carried out to date on the properties of the D-2 statistic has been restricted to the case of 'Bernoulli' sequences composed of identically and independently distributed letters. Here the properties of the distribution of this statistic for the biologically more realistic case of Markovian sequences is studied. The approach is novel in that Markovian dependency is defined for sequences with periodic boundary conditions, and this enables exact analytic formulae for the mean and variance to be derived. The formulae are confirmed using numerical simulations, and asymptotic approximations to the full distribution are tested.
引用
收藏
页码:25 / 33
页数:9
相关论文
共 50 条