andi: Fast and accurate estimation of evolutionary distances between closely related genomes

被引:59
|
作者
Haubold, Bernhard [1 ]
Kloetzl, Fabian [1 ,2 ]
Pfaffelhuber, Peter [3 ]
机构
[1] Max Planck Inst Evolut Biol, Dept Evolutionary Genet, D-24306 Plon, Germany
[2] Med Univ Lubeck, Inst Neuro & Bioinformat, D-23562 Lubeck, Germany
[3] Univ Freiburg, Math Inst, Math Stochast, Freiburg, Germany
关键词
COMMON SUBSTRING APPROACH; MULTIPLE ALIGNMENT; SEQUENCE; RECONSTRUCTION; RECOMBINATION; ORGANISMS;
D O I
10.1093/bioinformatics/btu815
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A standard approach to classifying sets of genomes is to calculate their pairwise distances. This is difficult for large samples. We have therefore developed an algorithm for rapidly computing the evolutionary distances between closely related genomes. Results: Our distance measure is based on ungapped local alignments that we anchor through pairs of maximal unique matches of a minimum length. These exact matches can be looked up efficiently using enhanced suffix arrays and our implementation requires approximately only 1 s and 45 MB RAM/Mbase analysed. The pairing of matches distinguishes non-homologous from homologous regions leading to accurate distance estimation. We show this by analysing simulated data and genome samples ranging from 29 Escherichia coli/Shigella genomes to 3085 genomes of Streptococcus pneumoniae.
引用
收藏
页码:1169 / 1175
页数:7
相关论文
共 41 条