New powerful statistics for alignment-free sequence comparison under a pattern transfer model

被引:33
|
作者
Liu, Xuemei [1 ,2 ]
Wan, Lin [1 ]
Li, Jing [1 ]
Reinert, Gesine [3 ]
Waterman, Michael S. [1 ,4 ]
Sun, Fengzhu [1 ,4 ]
机构
[1] Univ So Calif, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
[2] S China Univ Technol, Sch Phys, Guangzhou, Guangdong, Peoples R China
[3] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[4] Tsinghua Univ, TNLIST Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Alignment-free sequence comparison; D-2; Pattern transfer model; K-WORD MATCHES; PHYLOGENETIC TREE RECONSTRUCTION; FEATURE FREQUENCY PROFILES; COMPOSITION VECTOR METHOD; WHOLE-PROTEOME PHYLOGENY; REGULATORY SEQUENCES; ASYMPTOTIC-BEHAVIOR; DNA; GENOMES; DISSIMILARITY;
D O I
10.1016/j.jtbi.2011.06.020
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D-2 and its variants D-2* and D-2(s) showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D-2, D-2* and D-2(s) by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. (c) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:106 / 116
页数:11
相关论文
共 50 条
  • [41] Alignment-free sequence comparison for virus genomes based on location correlation coefficient
    He, Lily
    Sun, Siyang
    Zhang, Qianyue
    Bao, Xiaona
    Li, Peter K.
    INFECTION GENETICS AND EVOLUTION, 2021, 96
  • [42] Interpreting alignment-free sequence comparison: what makes a score a good score?
    Swain, Martin T.
    Vickers, Martin
    NAR GENOMICS AND BIOINFORMATICS, 2022, 4 (03)
  • [43] A phylogenetic analysis of the Brassicales clade based on an alignment-free sequence comparison method
    Hatje, Klas
    Kollmar, Martin
    FRONTIERS IN PLANT SCIENCE, 2012, 3
  • [44] Extraction of high quality k-words for alignment-free sequence comparison
    Gunasinghe, Upuli
    Alahakoon, Damminda
    Bedingfield, Susan
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 358 : 31 - 51
  • [45] Fast alignment-free sequence comparison using spaced-word frequencies
    Leimeister, Chris-Andre
    Boden, Marcus
    Horwege, Sebastian
    Lindner, Sebastian
    Morgenstern, Burkhard
    BIOINFORMATICS, 2014, 30 (14) : 1991 - 1999
  • [46] Alignment-Free Sequence Comparison Using N-Dimensional Similarity Space
    Jayalakshmi, Ramamurthy
    Natarajan, Ramanathan
    Vivekanandan, Munusamy
    Natarajan, Ganapathy S.
    CURRENT COMPUTER-AIDED DRUG DESIGN, 2010, 6 (04) : 290 - 296
  • [47] Alignment-Free Sequence Comparison Based on Next-Generation Sequencing Reads
    Song, Kai
    Ren, Jie
    Zhai, Zhiyuan
    Liu, Xuemei
    Deng, Minghua
    Sun, Fengzhu
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (02) : 64 - 79
  • [48] An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop
    Giuseppe Cattaneo
    Umberto Ferraro Petrillo
    Raffaele Giancarlo
    Gianluca Roscigno
    The Journal of Supercomputing, 2017, 73 : 1467 - 1483
  • [49] Alignment-Free Sequence Comparison: A Systematic Survey From a Machine Learning Perspective
    Bohnsack, Katrin Sophie
    Kaden, Marika
    Abel, Julia
    Villmann, Thomas
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (01) : 119 - 135
  • [50] Numerical Characterization of DNA Sequences for Alignment-free Sequence Comparison-A Review
    Ramanathan, Natarajan
    Ramamurthy, Jayalakshmi
    Natarajan, Ganapathy
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2022, 25 (03) : 365 - 380