FAMSA: Fast and accurate multiple sequence alignment of huge protein families

被引:66
|
作者
Deorowicz, Sebastian [1 ]
Debudaj-Grabysz, Agnieszka [1 ]
Gudys, Adam [1 ]
机构
[1] Silesian Tech Univ, Inst Informat, Akad 16, PL-44100 Gliwice, Poland
来源
SCIENTIFIC REPORTS | 2016年 / 6卷
关键词
CHAINED GUIDE TREES; ALGORITHM; QUALITY; PERFORMANCE; BENCHMARK;
D O I
10.1038/srep33964
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise similarities, a novel method of evaluating gap costs, and a new iterative refinement scheme. What matters is that its implementation is highly optimized and parallelized to make the most of modern computer platforms. Thanks to the above, quality indicators, i.e. sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms, such as Clustal Omega or MAFFT for datasets exceeding a few thousand sequences. Quality does not compromise on time or memory requirements, which are an order of magnitude lower than those in the existing solutions. For example, a family of 415519 sequences was analyzed in less than two hours and required no more than 8 GB of RAM. FAMSA is available for free at http://sun.aei.polsl.pl/REFRESH/famsa.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] FAMSA: Fast and accurate multiple sequence alignment of huge protein families
    Sebastian Deorowicz
    Agnieszka Debudaj-Grabysz
    Adam Gudyś
    [J]. Scientific Reports, 6
  • [2] Kalign – an accurate and fast multiple sequence alignment algorithm
    Timo Lassmann
    Erik LL Sonnhammer
    [J]. BMC Bioinformatics, 6
  • [3] Kalign - an accurate and fast multiple sequence alignment algorithm
    Lassmann, T
    Sonnhammer, ELL
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [4] Fast and accurate alignment of multiple protein networks
    Kalaev, Maxim
    Bafna, Vineet
    Sharan, Roded
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS, 2008, 4955 : 246 - +
  • [5] Fast and Accurate Alignment of Multiple Protein Networks
    Kalaev, Maxim
    Bafna, Vineet
    Sharan, Roded
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2009, 16 (08) : 989 - 999
  • [6] PCMA: fast and accurate multiple sequence alignment based on profile consistency
    Pei, JM
    Sadreyev, R
    Grishin, NV
    [J]. BIOINFORMATICS, 2003, 19 (03) : 427 - 428
  • [7] Fast protein fold recognition and accurate sequence-structure alignment
    Zimmer, R
    Thiele, R
    [J]. BIOINFORMATICS, 1997, 1278 : 137 - 146
  • [8] VCSRA: A fast and accurate multiple sequence alignment algorithm with a high degree of parallelism
    Dong, Dong
    Su, Wenhe
    Shi, Wenqiang
    Zou, Quan
    Peng, Shaoliang
    [J]. JOURNAL OF GENETICS AND GENOMICS, 2018, 45 (07) : 407 - 410
  • [9] VCSRA: A fast and accurate multiple sequence alignment algorithm with a high degree of parallelism
    Dong Dong
    Wenhe Su
    Wenqiang Shi
    Quan Zou
    Shaoliang Peng
    [J]. Journal of Genetics and Genomics, 2018, 45 (07) : 407 - 410
  • [10] T-Coffee: A novel method for fast and accurate multiple sequence alignment
    Notredame, C
    Higgins, DG
    Heringa, J
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 302 (01) : 205 - 217