The Invariance of Spectral-Kolmogorov-Type statistics for Estimating Genomic Similarity

被引:1
|
作者
Thornton, Micah [1 ,2 ]
机构
[1] Southern Methodist Univ, Dallas, TX 75275 USA
[2] Univ Texas Southwestern, Dallas, TX 75390 USA
关键词
Genomic Genetic Sequence Similarity Wavelet Transform Kolmogorov-Smirnoff Invariance Pedigree Phylogeny Clustering Organisms;
D O I
10.1109/ISMVL.2019.00021
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Accurate and efficient comparison of genetic sequences is an important undertaking that has applications in medicine as well as informing the hierarchical clustering of organisms. Genomic comparison is important for full genomic sequences across individuals of the same or different species as in phylogenies and also within organisms as in pedigrees. Given the enormity of the different genomes and their respective sizes, such comparisons are well-known to be computationally intensive and we are motivated to find more efficient and accurate means for the genomic comparison problem. This paper introduces a metric that is computed via the proposed methodology of comparing the empirical distributions of the observed k-mers among one or more genetic sequences. This metric is in fact a Kolmogorov-Smirnoff-like statistic since it is the supremum of differences in the empirical distribution functions. Specifically, genetic sequences are represented as quaternary or radix-4 encoded sequences that allow the metric to be computed and the metric is shown to produce similar clusterings when computed via spectral coefficients. Further, we investigate the use of spectral methods, in particular the Walsh-Hadamard spectrum, of the quaternary-encoded genetic sequence and observe computed maximal spectral densities as a basis of comparison. The invariance of the Kolmogorov-Smirnoff-like statistic when it is computed in the Walsh-Hadamard domain can enable faster comparison computations through the use of spectral properties. For example, the convolution of two sequences becomes a simple multiplication in the spectral domain.
引用
收藏
页码:73 / 78
页数:6
相关论文
共 44 条
  • [31] An adaptive scale estimating method of multiscale image segmentation based on vector edge and spectral statistics information
    Liu, Jianhua
    Pu, Heng
    Song, Shiran
    Du, Mingyi
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2018, 39 (20) : 6826 - 6845
  • [32] New spectral positional invariance approach for superresolution of point type targets embeded in colored noise
    Ponce-Dávalos, JL
    Shkvarko, YV
    Leyva-Montiel, JL
    IVTH INTERNATIONAL CONFERENCE ON ANTENNA THEORY AND TECHNIQUES, VOLS 1 AND 2, PROCEEDINGS, 2003, : 527 - 530
  • [33] SOME STATISTICS OF SOLAR RADIO-BURSTS OF SPECTRAL TYPE-II AND TYPE-IV
    CANE, HV
    REAMES, DV
    ASTROPHYSICAL JOURNAL, 1988, 325 (02): : 901 - 904
  • [34] Central limit theorem for linear spectral statistics of block-Wigner-type matrices
    Wang, Zhenggang
    Yao, Jianfeng
    RANDOM MATRICES-THEORY AND APPLICATIONS, 2023, 12 (04)
  • [35] Constraining the multiplicity statistics of the coolest brown dwarfs: binary fraction continues to decrease with spectral type
    Fontanive, Clemence
    Biller, Beth
    Bonavita, Mariangela
    Allers, Katelyn
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2018, 479 (02) : 2702 - 2727
  • [36] Statistical properties of the localization measure of chaotic eigenstates and the spectral statistics in a mixed-type billiard
    Batistic, Benjamin
    Lozej, Crt
    Robnik, Marko
    PHYSICAL REVIEW E, 2019, 100 (06)
  • [37] A General Type-II Similarity Based Model for Breast Cancer Grading with FTIR Spectral Data
    Naqvi, Shabbar
    Miller, Simon
    Garibaldi, Jonathan M.
    2014 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2014, : 834 - 841
  • [38] Power and Type I error rates of goodness-of-fit statistics for binomial generalized estimating equations (GEE) models
    Lin, HY
    Myers, L
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 50 (12) : 3432 - 3448
  • [39] KOLMOGOROV-SMIRNOV TYPE TEST-STATISTICS FOR THE GAMMA-DISTRIBUTION, ERLANG-2 DISTRIBUTION AND THE INVERSE GAUSSIAN DISTRIBUTION WHEN THE PARAMETERS ARE UNKNOWN
    TADIKAMALLA, PR
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 1990, 19 (01) : 305 - 314
  • [40] Stark polarization spectroscopy of neon spectral lines for estimating cathode sheath parameters in Grimm-type glow discharge sources
    Ivanovic, Nikola, V
    Nedic, Nikodin, V
    Videnovic, Ivan R.
    Spasojevic, Dj.
    Konjevic, Nikola
    SPECTROCHIMICA ACTA PART B-ATOMIC SPECTROSCOPY, 2023, 208