New powerful statistics for alignment-free sequence comparison under a pattern transfer model

被引:33
|
作者
Liu, Xuemei [1 ,2 ]
Wan, Lin [1 ]
Li, Jing [1 ]
Reinert, Gesine [3 ]
Waterman, Michael S. [1 ,4 ]
Sun, Fengzhu [1 ,4 ]
机构
[1] Univ So Calif, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
[2] S China Univ Technol, Sch Phys, Guangzhou, Guangdong, Peoples R China
[3] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[4] Tsinghua Univ, TNLIST Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Alignment-free sequence comparison; D-2; Pattern transfer model; K-WORD MATCHES; PHYLOGENETIC TREE RECONSTRUCTION; FEATURE FREQUENCY PROFILES; COMPOSITION VECTOR METHOD; WHOLE-PROTEOME PHYLOGENY; REGULATORY SEQUENCES; ASYMPTOTIC-BEHAVIOR; DNA; GENOMES; DISSIMILARITY;
D O I
10.1016/j.jtbi.2011.06.020
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D-2 and its variants D-2* and D-2(s) showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D-2, D-2* and D-2(s) by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. (c) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:106 / 116
页数:11
相关论文
共 50 条
  • [21] Alignment-Free Sequence Comparison over Hadoop for Computational Biology
    Cattaneo, Giuseppe
    Petrillo, Umberto Ferraro
    Giancarlo, Raffaele
    Roscigno, Gianluca
    2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, 2015, : 184 - 192
  • [22] Statistical considerations underpinning an alignment-free sequence comparison method
    Jing, Junmei
    Burden, Conrad J.
    Foret, Sylvain
    Wilson, Susan R.
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2010, 39 (03) : 325 - 335
  • [23] Variable length local decoding and alignment-free sequence comparison
    Didier, Gilles
    Corel, Eduardo
    Laprevotte, Ivan
    Grossmann, Alex
    Landes-Devauchelle, Claudine
    THEORETICAL COMPUTER SCIENCE, 2012, 462 : 1 - 11
  • [24] An improved alignment-free model for dna sequence similarity metric
    Junpeng Bao
    Ruiyu Yuan
    Zhe Bao
    BMC Bioinformatics, 15
  • [25] An improved alignment-free model for dna sequence similarity metric
    Bao, Junpeng
    Yuan, Ruiyu
    Bao, Zhe
    BMC BIOINFORMATICS, 2014, 15
  • [26] Alignment-free Sequence Comparison for Biologically Realistic Sequences of Moderate Length
    Burden, Conrad J.
    Jing, Junmei
    Wilson, Susan R.
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2012, 11 (01)
  • [27] SENSE: Siamese neural network for sequence embedding and alignment-free comparison
    Zheng, Wei
    Yang, Le
    Genco, Robert J.
    Wactawski-Wende, Jean
    Buck, Michael
    Sun, Yijun
    BIOINFORMATICS, 2019, 35 (11) : 1820 - 1828
  • [28] MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics
    Pizzi, Cinzia
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2016, 11
  • [29] Alignment-free genomic sequence comparison using FCGR and signal processing
    Lichtblau, Daniel
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [30] Weighted measures based on maximizing deviation for alignment-free sequence comparison
    Qian, Kun
    Luan, Yihui
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2017, 481 : 235 - 242