Alignment-Free Sequence Comparison Based on Next-Generation Sequencing Reads

被引:57
|
作者
Song, Kai [1 ]
Ren, Jie [1 ]
Zhai, Zhiyuan [2 ]
Liu, Xuemei [3 ]
Deng, Minghua [1 ]
Sun, Fengzhu [4 ,5 ]
机构
[1] Peking Univ, Sch Math, Beijing 100871, Peoples R China
[2] Shandong Univ, Sch Math, Jinan, Shandong, Peoples R China
[3] S China Univ Technol, Sch Phys, Guangzhou, Guangdong, Peoples R China
[4] Univ So Calif, Los Angeles, CA 90089 USA
[5] Tsinghua Univ, TNLIST, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
HMM; NGS; normal approximation; statistical power; word count statistics; FEATURE FREQUENCY PROFILES; SIMILARITY;
D O I
10.1089/cmb.2012.0228
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Next-generation sequencing (NGS) technologies have generated enormous amounts of shotgun read data, and assembly of the reads can be challenging, especially for organisms without template sequences. We study the power of genome comparison based on shotgun read data without assembly using three alignment-free sequence comparison statistics, D-2, D-2*, and D-2(S), both theoretically and by simulations. Theoretical formulas for the power of detecting the relationship between two sequences related through a common motif model are derived. It is shown that both D-2* and D-2(S) outperform D2 for detecting the relationship between two sequences based on NGS data. We then study the effects of length of the tuple, read length, coverage, and sequencing error on the power of D-2* and D-2(S). Finally, variations of these statistics, d(2), d(2)* and d(2)(S), respectively, are used to first cluster five mammalian species with known phylogenetic relationships, and then cluster 13 tree species whose complete genome sequences are not available using NGS shotgun reads. The clustering results using d(2)(S) are consistent with biological knowledge for the 5 mammalian and 13 tree species, respectively. Thus, the statistic d(2)(S) provides a powerful alignment-free comparison tool to study the relationships among different organisms based on NGS read data without assembly.
引用
收藏
页码:64 / 79
页数:16
相关论文
共 50 条
  • [1] Next generation sequencing reads comparison with an alignment-free distance
    Weitschek E.
    Santoni D.
    Fiscon G.
    De Cola M.C.
    Bertolazzi P.
    Felici G.
    [J]. BMC Research Notes, 7 (1)
  • [2] New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing
    Song, Kai
    Ren, Jie
    Reinert, Gesine
    Deng, Minghua
    Waterman, Michael S.
    Sun, Fengzhu
    [J]. BRIEFINGS IN BIOINFORMATICS, 2014, 15 (03) : 343 - 353
  • [3] Alignment of Next-Generation Sequencing Reads
    Reinert, Knut
    Langmead, Ben
    Weese, David
    Evers, Dirk J.
    [J]. ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 16, 2015, 16 : 133 - 151
  • [4] A Review on Sequence Alignment Algorithms for Short Reads Based on Next-Generation Sequencing
    Kim, Jeongkyu
    Ji, Mingeun
    Yi, Gangman
    [J]. IEEE ACCESS, 2020, 8 : 189811 - 189822
  • [5] GeneToCN: an alignment-free method for gene copy number estimation directly from next-generation sequencing reads
    Pajuste, Fanny-Dhelia
    Remm, Maido
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [6] GeneToCN: an alignment-free method for gene copy number estimation directly from next-generation sequencing reads
    Fanny-Dhelia Pajuste
    Maido Remm
    [J]. Scientific Reports, 13
  • [7] Comparison of Sequence Reads Obtained from Three Next-Generation Sequencing Platforms
    Suzuki, Shingo
    Ono, Naoaki
    Furusawa, Chikara
    Ying, Bei-Wen
    Yomo, Tetsuya
    [J]. PLOS ONE, 2011, 6 (05):
  • [8] An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
    Fan, Huan
    Ives, Anthony R.
    Surget-Groba, Yann
    Cannon, Charles H.
    [J]. BMC GENOMICS, 2015, 16
  • [9] An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
    Huan Fan
    Anthony R. Ives
    Yann Surget-Groba
    Charles H. Cannon
    [J]. BMC Genomics, 16
  • [10] KmerToCN: an alignment-free method for copy number estimation directly from next generation sequencing reads
    Pajuste, Fanny-Dhelia
    Remm, Maido
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 672 - 672