Kalign 3: multiple sequence alignment of large datasets

被引:73
|
作者
Lassmann, Timo [1 ]
机构
[1] Univ Western Australia, Telethon Kids Inst, Nedlands, WA, Australia
关键词
ALGORITHM; BENCHMARK;
D O I
10.1093/bioinformatics/btz795
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Kalign is an efficient multiple sequence alignment (MSA) program capable of aligning thousands of protein or nucleotide sequences. However, current alignment problems involving large numbers of sequences are exceeding Kalign's original design specifications. Here we present a completely re-written and updated version to meet current and future alignment challenges. Results: Kalign now uses a SIMD (single instruction, multiple data) accelerated version of the bit-parallel Gene Myers algorithm to estimate pairwise distances, adopts a sequence embedding strategy and the bi-secting K-means algorithm to rapidly construct guide trees for thousands of sequences. The new version maintains high alignment accuracy on both protein and nucleotide alignments and scales better than other MSA tools.
引用
收藏
页码:1928 / 1929
页数:2
相关论文
共 50 条
  • [1] Kalign – an accurate and fast multiple sequence alignment algorithm
    Timo Lassmann
    Erik LL Sonnhammer
    [J]. BMC Bioinformatics, 6
  • [2] Kalign - an accurate and fast multiple sequence alignment algorithm
    Lassmann, T
    Sonnhammer, ELL
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [3] Scaling statistical multiple sequence alignment to large datasets
    Michael Nute
    Tandy Warnow
    [J]. BMC Genomics, 17
  • [4] Scaling statistical multiple sequence alignment to large datasets
    Nute, Michael
    Warnow, Tandy
    [J]. BMC GENOMICS, 2016, 17
  • [5] Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignment
    Lassmann, Timo
    Sonnhammer, Erik L. L.
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : W596 - W599
  • [6] Kalign-LCS - A More Accurate and Faster Variant of Kalign2 Algorithm for the Multiple Sequence Alignment Problem
    Deorowicz, Sebastian
    Debudaj-Grabysz, Agnieszka
    Gudys, Adam
    [J]. MAN-MACHINE INTERACTIONS 3, 2014, 242 : 495 - 502
  • [7] Benchmark datasets and software for developing and testing methods for large-scale multiple sequence alignment and phylogenetic inference
    Linder, C. Randal
    Liu, Rahul SuriKevin
    Warnow, Tandy
    [J]. PLOS CURRENTS-TREE OF LIFE, 2010,
  • [8] Accelerated large-scale multiple sequence alignment
    Scott Lloyd
    Quinn O Snell
    [J]. BMC Bioinformatics, 12
  • [9] Accelerated large-scale multiple sequence alignment
    Lloyd, Scott
    Snell, Quinn O.
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [10] PASTA: Ultra-Large Multiple Sequence Alignment
    Mirarab, Siavash
    Nguyen, Nam
    Warnow, Tandy
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB2014, 2014, 8394 : 177 - 191