New powerful statistics for alignment-free sequence comparison under a pattern transfer model

被引:33
|
作者
Liu, Xuemei [1 ,2 ]
Wan, Lin [1 ]
Li, Jing [1 ]
Reinert, Gesine [3 ]
Waterman, Michael S. [1 ,4 ]
Sun, Fengzhu [1 ,4 ]
机构
[1] Univ So Calif, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
[2] S China Univ Technol, Sch Phys, Guangzhou, Guangdong, Peoples R China
[3] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[4] Tsinghua Univ, TNLIST Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Alignment-free sequence comparison; D-2; Pattern transfer model; K-WORD MATCHES; PHYLOGENETIC TREE RECONSTRUCTION; FEATURE FREQUENCY PROFILES; COMPOSITION VECTOR METHOD; WHOLE-PROTEOME PHYLOGENY; REGULATORY SEQUENCES; ASYMPTOTIC-BEHAVIOR; DNA; GENOMES; DISSIMILARITY;
D O I
10.1016/j.jtbi.2011.06.020
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D-2 and its variants D-2* and D-2(s) showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D-2, D-2* and D-2(s) by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. (c) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:106 / 116
页数:11
相关论文
共 50 条
  • [31] Simplification of protein sequence and alignment-free sequence analysis
    Li Jing
    Li Feng-Bo
    Wang Wei
    PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, 2006, 33 (12) : 1215 - 1222
  • [32] Alignment-free genomic sequence comparison using FCGR and signal processing
    Daniel Lichtblau
    BMC Bioinformatics, 20
  • [33] MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics
    Cinzia Pizzi
    Algorithms for Molecular Biology, 11
  • [34] Alignment-free sequence comparison with vector quantization and hidden Markov models
    Pham, T
    PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 534 - 535
  • [35] Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer
    Bernard, Guillaume
    Chan, Cheong Xin
    Ragan, Mark A.
    SCIENTIFIC REPORTS, 2016, 6
  • [36] Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer
    Guillaume Bernard
    Cheong Xin Chan
    Mark A. Ragan
    Scientific Reports, 6
  • [37] Alignment-free viral sequence classification at scale
    Daniel J. van Zyl
    Marcel Dunaiski
    Houriiyah Tegally
    Cheryl Baxter
    Tulio de Oliveira
    Joicymara S. Xavier
    BMC Genomics, 26 (1)
  • [38] CAFE: aCcelerated Alignment-FrEe sequence analysis
    Lu, Yang Young
    Tang, Kujin
    Ren, Jie
    Fuhrman, Jed A.
    Waterman, Michael S.
    Sun, Fengzhu
    NUCLEIC ACIDS RESEARCH, 2017, 45 (W1) : W554 - W559
  • [39] Alignment-free comparison of genome sequences by a new numerical characterization
    Huang, Guohua
    Zhou, Houqing
    Li, Yongfan
    Xu, Lixin
    JOURNAL OF THEORETICAL BIOLOGY, 2011, 281 (01) : 107 - 112
  • [40] An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop
    Cattaneo, Giuseppe
    Petrillo, Umberto Ferraro
    Giancarlo, Raffaele
    Roscigno, Gianluca
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (04): : 1467 - 1483