New powerful statistics for alignment-free sequence comparison under a pattern transfer model

被引:33
|
作者
Liu, Xuemei [1 ,2 ]
Wan, Lin [1 ]
Li, Jing [1 ]
Reinert, Gesine [3 ]
Waterman, Michael S. [1 ,4 ]
Sun, Fengzhu [1 ,4 ]
机构
[1] Univ So Calif, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
[2] S China Univ Technol, Sch Phys, Guangzhou, Guangdong, Peoples R China
[3] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[4] Tsinghua Univ, TNLIST Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Alignment-free sequence comparison; D-2; Pattern transfer model; K-WORD MATCHES; PHYLOGENETIC TREE RECONSTRUCTION; FEATURE FREQUENCY PROFILES; COMPOSITION VECTOR METHOD; WHOLE-PROTEOME PHYLOGENY; REGULATORY SEQUENCES; ASYMPTOTIC-BEHAVIOR; DNA; GENOMES; DISSIMILARITY;
D O I
10.1016/j.jtbi.2011.06.020
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D-2 and its variants D-2* and D-2(s) showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D-2, D-2* and D-2(s) by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. (c) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:106 / 116
页数:11
相关论文
共 50 条
  • [1] Alignment-Free Sequence Comparison (I): Statistics and Power
    Reinert, Gesine
    Chew, David
    Sun, Fengzhu
    Waterman, Michael S.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2009, 16 (12) : 1615 - 1634
  • [2] Alignment-Free Sequence Comparison (II): Theoretical Power of Comparison Statistics
    Wan, Lin
    Reinert, Gesine
    Sun, Fengzhu
    Waterman, Michael S.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2010, 17 (11) : 1467 - +
  • [3] Alignment-free sequence comparison - a review
    Vinga, S
    Almeida, J
    BIOINFORMATICS, 2003, 19 (04) : 513 - 523
  • [4] Multiple alignment-free sequence comparison
    Ren, Jie
    Song, Kai
    Sun, Fengzhu
    Deng, Minghua
    Reinert, Gesine
    BIOINFORMATICS, 2013, 29 (21) : 2690 - 2698
  • [5] New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing
    Song, Kai
    Ren, Jie
    Reinert, Gesine
    Deng, Minghua
    Waterman, Michael S.
    Sun, Fengzhu
    BRIEFINGS IN BIOINFORMATICS, 2014, 15 (03) : 343 - 353
  • [6] A survey and evaluations of histogram-based statistics in alignment-free sequence comparison
    Luczak, Brian B.
    James, Benjamin T.
    Girgis, Hani Z.
    BRIEFINGS IN BIOINFORMATICS, 2019, 20 (04) : 1222 - 1237
  • [7] Benchmarking of alignment-free sequence comparison methods
    Zielezinski, Andrzej
    Girgis, Hani Z.
    Bernard, Guillaume
    Leimeister, Chris-Andre
    Tang, Kujin
    Dencker, Thomas
    Lau, Anna Katharina
    Roehling, Sophie
    Choi, Jae Jin
    Waterman, Michael S.
    Comin, Matteo
    Kim, Sung-Hou
    Vinga, Susana
    Almeida, Jonas S.
    Chan, Cheong Xin
    James, Benjamin T.
    Sun, Fengzhu
    Morgenstern, Burkhard
    Karlowski, Wojciech M.
    GENOME BIOLOGY, 2019, 20 (1)
  • [8] A probabilistic measure for alignment-free sequence comparison
    Pham, TD
    Zuegg, J
    BIOINFORMATICS, 2004, 20 (18) : 3455 - 3461
  • [9] Benchmarking of alignment-free sequence comparison methods
    Andrzej Zielezinski
    Hani Z. Girgis
    Guillaume Bernard
    Chris-Andre Leimeister
    Kujin Tang
    Thomas Dencker
    Anna Katharina Lau
    Sophie Röhling
    Jae Jin Choi
    Michael S. Waterman
    Matteo Comin
    Sung-Hou Kim
    Susana Vinga
    Jonas S. Almeida
    Cheong Xin Chan
    Benjamin T. James
    Fengzhu Sun
    Burkhard Morgenstern
    Wojciech M. Karlowski
    Genome Biology, 20
  • [10] A Geometric Interpretation for Local Alignment-Free Sequence Comparison
    Behnam, Ehsan
    Waterman, Michael S.
    Smith, Andrew D.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (07) : 471 - 485