FAMSA: Fast and accurate multiple sequence alignment of huge protein families

被引:66
|
作者
Deorowicz, Sebastian [1 ]
Debudaj-Grabysz, Agnieszka [1 ]
Gudys, Adam [1 ]
机构
[1] Silesian Tech Univ, Inst Informat, Akad 16, PL-44100 Gliwice, Poland
来源
SCIENTIFIC REPORTS | 2016年 / 6卷
关键词
CHAINED GUIDE TREES; ALGORITHM; QUALITY; PERFORMANCE; BENCHMARK;
D O I
10.1038/srep33964
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise similarities, a novel method of evaluating gap costs, and a new iterative refinement scheme. What matters is that its implementation is highly optimized and parallelized to make the most of modern computer platforms. Thanks to the above, quality indicators, i.e. sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms, such as Clustal Omega or MAFFT for datasets exceeding a few thousand sequences. Quality does not compromise on time or memory requirements, which are an order of magnitude lower than those in the existing solutions. For example, a family of 415519 sequences was analyzed in less than two hours and required no more than 8 GB of RAM. FAMSA is available for free at http://sun.aei.polsl.pl/REFRESH/famsa.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference
    Steenwyk, Jacob L.
    Buida, Thomas J., III
    Li, Yuanning
    Shen, Xing-Xing
    Rokas, Antonis
    [J]. PLOS BIOLOGY, 2020, 18 (12)
  • [32] Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee
    Chang, Jia-Ming
    Di Tommaso, Paolo
    Taly, Jean-Francois
    Notredame, Cedric
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [33] Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee
    Jia-Ming Chang
    Paolo Di Tommaso
    Jean-François Taly
    Cedric Notredame
    [J]. BMC Bioinformatics, 13
  • [34] Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families
    Hubley, Robert
    Wheeler, Travis J.
    Smit, Arian F. A.
    [J]. NAR GENOMICS AND BIOINFORMATICS, 2022, 4 (02)
  • [35] A hybrid solver for protein multiple sequence alignment problem
    Chaabane, Lamiche
    [J]. JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2018, 16 (04)
  • [36] WOAMSA: Whale Optimization Algorithm for Multiple Sequence Alignment of Protein Sequence
    Kumar, Manish
    Kumar, Ranjeet
    Nidhya, R.
    [J]. COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING, 2020, 1108 : 131 - 139
  • [37] CoMSA: compression of protein multiple sequence alignment files
    Deorowicz, Sebastian
    Walczyszyn, Joanna
    Debudaj-Grabysz, Agnieszka
    [J]. BIOINFORMATICS, 2019, 35 (02) : 227 - 234
  • [38] Multiple protein sequence alignment: Algorithms and gap insertion
    Taylor, WR
    [J]. COMPUTER METHODS FOR MACROMOLECULAR SEQUENCE ANALYSIS, 1996, 266 : 343 - 367
  • [39] Local weighting schemes for protein multiple sequence alignment
    Heringa, J
    [J]. COMPUTERS & CHEMISTRY, 2002, 26 (05): : 459 - 477
  • [40] The Impact of Multiple Protein Sequence Alignment on Phylogenetic Estimation
    Wang, Li-San
    Leebens-Mack, Jim
    Wall, P. Kerr
    Beckmann, Kevin
    dePamphilis, Claude W.
    Warnow, Tandy
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (04) : 1108 - 1119