The Invariance of Spectral-Kolmogorov-Type statistics for Estimating Genomic Similarity

被引:1
|
作者
Thornton, Micah [1 ,2 ]
机构
[1] Southern Methodist Univ, Dallas, TX 75275 USA
[2] Univ Texas Southwestern, Dallas, TX 75390 USA
关键词
Genomic Genetic Sequence Similarity Wavelet Transform Kolmogorov-Smirnoff Invariance Pedigree Phylogeny Clustering Organisms;
D O I
10.1109/ISMVL.2019.00021
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Accurate and efficient comparison of genetic sequences is an important undertaking that has applications in medicine as well as informing the hierarchical clustering of organisms. Genomic comparison is important for full genomic sequences across individuals of the same or different species as in phylogenies and also within organisms as in pedigrees. Given the enormity of the different genomes and their respective sizes, such comparisons are well-known to be computationally intensive and we are motivated to find more efficient and accurate means for the genomic comparison problem. This paper introduces a metric that is computed via the proposed methodology of comparing the empirical distributions of the observed k-mers among one or more genetic sequences. This metric is in fact a Kolmogorov-Smirnoff-like statistic since it is the supremum of differences in the empirical distribution functions. Specifically, genetic sequences are represented as quaternary or radix-4 encoded sequences that allow the metric to be computed and the metric is shown to produce similar clusterings when computed via spectral coefficients. Further, we investigate the use of spectral methods, in particular the Walsh-Hadamard spectrum, of the quaternary-encoded genetic sequence and observe computed maximal spectral densities as a basis of comparison. The invariance of the Kolmogorov-Smirnoff-like statistic when it is computed in the Walsh-Hadamard domain can enable faster comparison computations through the use of spectral properties. For example, the convolution of two sequences becomes a simple multiplication in the spectral domain.
引用
收藏
页码:73 / 78
页数:6
相关论文
共 44 条