PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences

被引:282
|
作者
Mirarab, Siavash [1 ]
Nam Nguyen [1 ]
Guo, Sheng [2 ]
Wang, Li-San [3 ]
Kim, Junhyong [3 ]
Warnow, Tandy [1 ,4 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[2] Univ Penn, Genom & Computat Biol Grad Grp, Philadelphia, PA 19104 USA
[3] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA
[4] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
metagenomics; phylogenetic trees; molecular evolution; algorithms; multiple alignment; INFORMATION; ACCURACY; FAMILIES; DATABASE; TREES;
D O I
10.1089/cmb.2014.0156
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATe). We also show that trees estimated on PASTA alignments are highly accurate-slightly better than SATe trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATe, highly parallelizable, and requires relatively little memory.
引用
收藏
页码:377 / 386
页数:10
相关论文
共 50 条
  • [1] PASTA: Ultra-Large Multiple Sequence Alignment
    Mirarab, Siavash
    Nguyen, Nam
    Warnow, Tandy
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB2014, 2014, 8394 : 177 - 191
  • [2] Reducing Alignment Time Complexity of Ultra-Large Sets of Sequences
    Rubio-Largo, Alvaro
    Vanneschi, Leonardo
    Castelli, Mauro
    Vega-Rodriguez, Miguel A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2017, 24 (11) : 1144 - 1152
  • [3] TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations
    Abascal, Federico
    Zardoya, Rafael
    Telford, Maximilian J.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : W7 - W13
  • [4] HAlign 3: Fast Multiple Alignment of Ultra-Large Numbers of Similar DNA/RNA Sequences
    Tang, Furong
    Chao, Jiannan
    Wei, Yanming
    Yang, Fenglong
    Zhai, Yixiao
    Xu, Lei
    Zou, Quan
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2022, 39 (08)
  • [5] SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array
    Wang, Ziyuan
    Tan, Junjie
    Long, Yanling
    Liu, Yijia
    Lei, Wenyan
    Cai, Jing
    Yang, Yi
    Liu, Zhibin
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 1487 - 1493
  • [6] OPTIMAL MATCHING OF AMINO-ACID OR NUCLEOTIDE-SEQUENCES
    TUMANYAN, VG
    POROYKOV, VV
    [J]. BIOFIZIKA, 1984, 29 (06): : 917 - 921
  • [7] AMINO-ACID SEQUENCE ALIGNMENT OF CEREAL STORAGE PROTEINS
    REECK, GR
    HEDGCOTH, C
    [J]. FEBS LETTERS, 1985, 180 (02) : 291 - 294
  • [8] REGIONAL VARIATION AND FUNCTION OF NUCLEOTIDE AND AMINO-ACID SEQUENCE
    KIHO, Y
    [J]. CELL STRUCTURE AND FUNCTION, 1988, 13 (05) : 387 - 405
  • [9] HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing
    Wan, Shixiang
    Zou, Quan
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2017, 12
  • [10] EMBL-Align: a new public nucleotide and amino acid multiple sequence alignment database
    Lombard, V
    Camon, EB
    Parkinson, HE
    Hingamp, P
    Stoesser, G
    Redaschi, N
    [J]. BIOINFORMATICS, 2002, 18 (05) : 763 - 764