FAMSA: Fast and accurate multiple sequence alignment of huge protein families

被引：66

作者：

Deorowicz, Sebastian ^{[1
]}

Debudaj-Grabysz, Agnieszka ^{[1
]}

Gudys, Adam ^{[1
]}

机构：

[1] Silesian Tech Univ, Inst Informat, Akad 16, PL-44100 Gliwice, Poland

来源：

SCIENTIFIC REPORTS | 2016年 / 6卷

关键词：

CHAINED GUIDE TREES; ALGORITHM; QUALITY; PERFORMANCE; BENCHMARK;

D O I：

10.1038/srep33964

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise similarities, a novel method of evaluating gap costs, and a new iterative refinement scheme. What matters is that its implementation is highly optimized and parallelized to make the most of modern computer platforms. Thanks to the above, quality indicators, i.e. sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms, such as Clustal Omega or MAFFT for datasets exceeding a few thousand sequences. Quality does not compromise on time or memory requirements, which are an order of magnitude lower than those in the existing solutions. For example, a family of 415519 sequences was analyzed in less than two hours and required no more than 8 GB of RAM. FAMSA is available for free at http://sun.aei.polsl.pl/REFRESH/famsa.

引用

页数：13

共 50 条

[31] ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference
Steenwyk, Jacob L.
Buida, Thomas J., III
Li, Yuanning
Shen, Xing-Xing
Rokas, Antonis
[J]. PLOS BIOLOGY, 2020, 18 (12)
[32] Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee
Chang, Jia-Ming
Di Tommaso, Paolo
Taly, Jean-Francois
Notredame, Cedric
[J]. BMC BIOINFORMATICS, 2012, 13
[33] Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee
Jia-Ming Chang
Paolo Di Tommaso
Jean-François Taly
Cedric Notredame
[J]. BMC Bioinformatics, 13
[34] Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families
Hubley, Robert
Wheeler, Travis J.
Smit, Arian F. A.
[J]. NAR GENOMICS AND BIOINFORMATICS, 2022, 4 (02)
[35] A hybrid solver for protein multiple sequence alignment problem
Chaabane, Lamiche
[J]. JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2018, 16 (04)
[36] WOAMSA: Whale Optimization Algorithm for Multiple Sequence Alignment of Protein Sequence
Kumar, Manish
Kumar, Ranjeet
Nidhya, R.
[J]. COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING, 2020, 1108 : 131 - 139
[37] CoMSA: compression of protein multiple sequence alignment files
Deorowicz, Sebastian
Walczyszyn, Joanna
Debudaj-Grabysz, Agnieszka
[J]. BIOINFORMATICS, 2019, 35 (02) : 227 - 234
[38] Multiple protein sequence alignment: Algorithms and gap insertion
Taylor, WR
[J]. COMPUTER METHODS FOR MACROMOLECULAR SEQUENCE ANALYSIS, 1996, 266 : 343 - 367
[39] Local weighting schemes for protein multiple sequence alignment
Heringa, J
[J]. COMPUTERS & CHEMISTRY, 2002, 26 (05): : 459 - 477
[40] The Impact of Multiple Protein Sequence Alignment on Phylogenetic Estimation
Wang, Li-San
Leebens-Mack, Jim
Wall, P. Kerr
Beckmann, Kevin
dePamphilis, Claude W.
Warnow, Tandy
[J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (04) : 1108 - 1119

← 1 2 3 4 5 →