A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words

被引:100
|
作者
Wu, TJ [1 ]
Burke, JP
Davison, DB
机构
[1] Natl Donghwa Univ, Dept Math Appl, Hualien, Taiwan
[2] Univ Houston, Dept Math, Houston, TX 77204 USA
[3] Univ Houston, Dept Biochem & Biophys Sci, Houston, TX 77204 USA
关键词
DNA sequences; dissimilarity measures; mahalanobis distance; standardized Euclidean distance;
D O I
10.2307/2533509
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A number of algorithms exist for searching genetic databases for biologically significant similarities in DNA sequences. Past research has shown that word-based search tools are computationally efficient and can find similarities or dissimilarities invisible to other algorithms like FASTA. We characterize a family of word-based dissimilarity measures that define distance between two sequences by simultaneously comparing the frequencies of all subsequences of n adjacent letters (i.e., n-words) in the two sequences. Applications to real data demonstrate that currently used word-based methods that rely on Euclidean distance can be significantly improved by using Mahalanobis distance, which accounts for both variances and covariances between frequencies of n-words. Furthermore, in those cases where Mahalanobis distance may be too difficult to compute, using standardized Euclidean distance, which only corrects for the variances of frequencies of n-words, still gives better performance than the Euclidean distance. Also, a simple way of combining distances obtained at different n-words is considered. The goal is to obtain a single measure of dissimilarity between two DNA sequences. The performance ranking of the preceding three distances still holds for their combined counterparts. All results obtained in this paper are applicable to amino acid sequences with minor modifications.
引用
收藏
页码:1431 / 1439
页数:9
相关论文
共 50 条
  • [41] DNA word analysis based on the distribution of the distances between symmetric words
    Ana H. M. P. Tavares
    Armando J. Pinho
    Raquel M. Silva
    João M. O. S. Rodrigues
    Carlos A. C. Bastos
    Paulo J. S. G. Ferreira
    Vera Afreixo
    Scientific Reports, 7
  • [42] A generalized belief dissimilarity measure based on weighted conflict belief and distance metric and its application in multi-source data fusion
    Zhou, Mi
    Zhou, Ya-Jing
    Yang, Jian-Bo
    Wu, Jian
    FUZZY SETS AND SYSTEMS, 2024, 475
  • [43] A Web Search Engine-Based Approach to Measure Semantic Similarity between Words
    Bollegala, Danushka
    Matsuo, Yutaka
    Ishizuka, Mitsuru
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (07) : 977 - 990
  • [44] Purification-based metric to measure the distance between quantum states and processes
    Osan, Tristan M.
    Lamberti, Pedro W.
    PHYSICAL REVIEW A, 2013, 87 (06)
  • [45] A Methodology to Measure the Diachronic Language Distance between Three Languages Based on Perplexity
    Pichel, Jose Ramom
    Gamallo, Pablo
    Alegria, Inaki
    Neves, Marco
    JOURNAL OF QUANTITATIVE LINGUISTICS, 2021, 28 (04) : 306 - 336
  • [46] Supplier Selection model based on distance measure between intuitionistic fuzzy sets
    Yuantao Song
    Qiang Zhang
    Xiaoguang Zhou
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 3795 - +
  • [47] A similarity measure between the target and its decoy based on the improved Hausdorff distance
    Wang, J
    He, PK
    Zhu, MY
    Zhao, BJ
    ICEMI 2005: Conference Proceedings of the Seventh International Conference on Electronic Measurement & Instruments, Vol 1, 2005, : 204 - 210
  • [48] Potential of a sequence-based antigenic distance measure to indicate equine influenza vaccine strain efficacy
    Daly, Janet M.
    Elton, Debra
    VACCINE, 2013, 31 (51) : 6043 - 6045
  • [49] RELATIONSHIP BETWEEN SEPARATION TIME AND GENETIC DISTANCE BASED ON ANGULAR TRANSFORMATIONS OF GENE-FREQUENCIES
    HEUCH, I
    BIOMETRICS, 1975, 31 (03) : 685 - 700
  • [50] Design of the equipment based on PSD to measure the distance between the two rails of rail gun
    Lv Aimin
    Li Shizhong
    Gui Yingchun
    ISTM/2007: 7TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-7, CONFERENCE PROCEEDINGS, 2007, : 4610 - 4611